Rui Zhao bio photo

Email

Twitter

Instagram

Github

Bilibili

Research Projects

Action-space Model-based Reinforcement Learning for VLA

2026 – now

  • Used a bidirectional per-token encoder to replace exploration through the denoising process by traditional gaussian action disturbance, ensuring fine-grained control over the action space.
  • Designed a new algorithm for likelihood-based reinforcement learning to remove the bias introduced by the disturbance.
  • Implemented isolated post-training on pi0-like VLA and applied foundation world model to improve the efficiency.

Compressing Generative World Models via Reinforcement Learning

2025 – now

  • Tried to remove the noises and reserve task-relevant or reward-relevant information in world model by making the output more compressed.
  • Designed a composite reward signal including jpeg compressibility, action prediction consistency (via a custom resnet inverse action model), and value preservation to guide a diffusion world model finetuning process.
  • Focused on efficient and stable RL post-training on world model to improve long-horizon accuracy and downstream policy performance.

RL for Reasoning

2025

  • Used the concepts in human cognitive psychology to design the reasoning process.
  • Constructed the cognitive graph / tree to represent the reasoning process and finding the better trajectory.
  • Used groups of samples with relative scores generated by DeepSeek R1 to train a smaller reward model.
  • Finetuned Qwen by RL to enhance general cognitive reasoning ability, not only on special tasks or datasets, and achieving better performance.

Automated Construction of Knowledge Graph in Knowledge-intensive Domain

2024 – 2025

  • Proposed an expandable framework of KG construction Tree-KG, including initial construction and iterative expansion.
  • Achieved SOTA performance compared with existing methods such as GraphRAG, especially when using textbooks.
  • Paper accepted at ACL 2025 main conference, Patent pending.

Industrial Experience

AI Teaching Assistant (Developer of AI Cosmos)

Tsinghua University | 2024 – 2025

Developed the core Meta Agent for the university’s AI education system.

  • Meta Agent Architecture: Based on OpenManus, designed a high-level agent capable of formulating reusable, generalizable workflows for diverse teaching tasks (e.g., grading, course design) from single instructions.
  • Automated Tool & Workflow Construction: Implemented a “Tool Maker” to synthesize Python scripts as new tools, and a “Workflow Maker” to organize operations into executable graphs, enabling the system to self-expand its capabilities.
  • Also worked on knowledge extraction: Extracting knowledge from textbooks and other materials to build a knowledge graph for the course.

Now AI Cosmos has been used in the university’s daily academic activities, including course learning and research assistance.