Research Projects
Action-space Model-based Reinforcement Learning for VLA
2026 – now
- Used a bidirectional per-token encoder to replace exploration through the denoising process by traditional gaussian action disturbance, ensuring fine-grained control over the action space.
- Designed a new algorithm for likelihood-based reinforcement learning to remove the bias introduced by the disturbance.
- Implemented isolated post-training on pi0-like VLA and applied foundation world model to improve the efficiency.
Compressing Generative World Models via Reinforcement Learning
2025 – now
- Tried to remove the noises and reserve task-relevant or reward-relevant information in world model by making the output more compressed.
- Designed a composite reward signal including jpeg compressibility, action prediction consistency (via a custom resnet inverse action model), and value preservation to guide a diffusion world model finetuning process.
- Focused on efficient and stable RL post-training on world model to improve long-horizon accuracy and downstream policy performance.
RL for Reasoning
2025
- Used the concepts in human cognitive psychology to design the reasoning process.
- Constructed the cognitive graph / tree to represent the reasoning process and finding the better trajectory.
- Used groups of samples with relative scores generated by DeepSeek R1 to train a smaller reward model.
- Finetuned Qwen by RL to enhance general cognitive reasoning ability, not only on special tasks or datasets, and achieving better performance.
Automated Construction of Knowledge Graph in Knowledge-intensive Domain
2024 – 2025
- Proposed an expandable framework of KG construction Tree-KG, including initial construction and iterative expansion.
- Achieved SOTA performance compared with existing methods such as GraphRAG, especially when using textbooks.
- Paper accepted at ACL 2025 main conference, Patent pending.
Industrial Experience
AI Teaching Assistant (Developer of AI Cosmos)
Tsinghua University | 2024 – 2025
Developed the core Meta Agent for the university’s AI education system.
- Meta Agent Architecture: Based on OpenManus, designed a high-level agent capable of formulating reusable, generalizable workflows for diverse teaching tasks (e.g., grading, course design) from single instructions.
- Automated Tool & Workflow Construction: Implemented a “Tool Maker” to synthesize Python scripts as new tools, and a “Workflow Maker” to organize operations into executable graphs, enabling the system to self-expand its capabilities.
- Also worked on knowledge extraction: Extracting knowledge from textbooks and other materials to build a knowledge graph for the course.
Now AI Cosmos has been used in the university’s daily academic activities, including course learning and research assistance.