Experience – Rui Zhao

Research Projects

2026 – now

Used a bidirectional per-token encoder to replace exploration through the denoising process by traditional gaussian action disturbance, ensuring fine-grained control over the action space.
Designed a new algorithm for likelihood-based reinforcement learning to remove the bias introduced by the disturbance.
Implemented isolated post-training on pi0-like VLA and applied foundation world model to improve the efficiency.

2025 – now

Tried to remove the noises and reserve task-relevant or reward-relevant information in world model by making the output more compressed.
Designed a composite reward signal including jpeg compressibility, action prediction consistency (via a custom resnet inverse action model), and value preservation to guide a diffusion world model finetuning process.
Focused on efficient and stable RL post-training on world model to improve long-horizon accuracy and downstream policy performance.

2025

Used the concepts in human cognitive psychology to design the reasoning process.
Constructed the cognitive graph / tree to represent the reasoning process and finding the better trajectory.
Used groups of samples with relative scores generated by DeepSeek R1 to train a smaller reward model.
Finetuned Qwen by RL to enhance general cognitive reasoning ability, not only on special tasks or datasets, and achieving better performance.

2024 – 2025

Proposed an expandable framework of KG construction Tree-KG, including initial construction and iterative expansion.
Achieved SOTA performance compared with existing methods such as GraphRAG, especially when using textbooks.
Paper accepted at ACL 2025 main conference, Patent pending.

Tsinghua University | 2024 – 2025

Developed the core Meta Agent for the university’s AI education system.

Meta Agent Architecture: Based on OpenManus, designed a high-level agent capable of formulating reusable, generalizable workflows for diverse teaching tasks (e.g., grading, course design) from single instructions.
Automated Tool & Workflow Construction: Implemented a “Tool Maker” to synthesize Python scripts as new tools, and a “Workflow Maker” to organize operations into executable graphs, enabling the system to self-expand its capabilities.
Also worked on knowledge extraction: Extracting knowledge from textbooks and other materials to build a knowledge graph for the course.

Now AI Cosmos has been used in the university’s daily academic activities, including course learning and research assistance.