Research Projects
Action-space Model-based Reinforcement Learning for VLA
2026 – now
- Used a bidirectional encoder to replace exploration through the denoising process by traditional gaussian action disturbance, ensuring fine-grained control over the action space.
- Designed the algorithm to perturb actions only within recognized key chunks to improve exploration efficiency and reduce redundant rollouts.
- Introduced world model to the training loop to provide sufficient data and accurate guidance.
Compressing Generative World Models via Reinforcement Learning
2025 – now
- Tried to remove the noises and reserve task-relevant or reward-relevant information in world model by making the output more compressed.
- Designed a composite reward signal including jpeg compressibility, action prediction consistency (via a custom resnet inverse action model), and value preservation to guide a diffusion world model finetuning process.
- Implemented PPO, DDPO, and AWM, adapting and unifying their codebases into a custom pipeline for diffusion world model fine-tuning, and conducted extensive experiments to iteratively refine the algorithms based on empirical results.
- Focused on efficient and stable RL post-training on world model to improve long-horizon accuracy and downstream policy performance.
RL for Reasoning
2025
- Used the concepts in human cognitive psychology to design the reasoning process.
- Constructed the cognitive graph / tree to represent the reasoning process and finding the better trajectory.
- Used groups of samples with relative scores generated by DeepSeek R1 to train a smaller reward model.
- Finetuned Qwen by RL to enhance general cognitive reasoning ability, not only on special tasks or datasets, and achieving better performance.
Automated Construction of Knowledge Graph in Knowledge-intensive Domain
2024 – 2025
- Proposed an expandable framework of KG construction Tree-KG, including initial construction and iterative expansion.
- Achieved SOTA performance compared with existing methods such as GraphRAG, especially when using textbooks.
- Paper accepted at ACL 2025 main conference, Patent number 202610098472.8.
Industrial Experience
Team Member of Embodied AI Business Unit
Noetix Robotics, Beijing | 2026 – now
Doing research and development on embodied AI, including:
- World Action Model Pretraining and Post-training.
- VLA Post-training: Designing and reproducing memory system and hierarchical inference mechanisms, implementing supervised finetuning and reinforcement learning to train robots to do real-world long-horizon tasks.
- Real-robot Deployment.
AI Teaching Assistant (Developer of AI Cosmos)
Tsinghua University | 2024 – 2025
Developed the core Meta Agent for the university’s AI education system.
- Meta Agent Architecture: Based on OpenManus, designed a high-level agent capable of formulating reusable, generalizable workflows for diverse teaching tasks (e.g., grading, course design) from single instructions.
- Automated Tool & Workflow Construction: Implemented a “Tool Maker” to synthesize Python scripts as new tools, and a “Workflow Maker” to organize operations into executable graphs, enabling the system to self-expand its capabilities.
- Also worked on knowledge extraction: Extracting knowledge from textbooks and other materials to build a knowledge graph for the course.
Now AI Cosmos has been used in the university’s daily academic activities, including course learning and research assistance.