Ψ₀
Search documents
Ψ₀刚刚开源了!迈向通用人形机器人的基座模型
机器之心· 2026-03-25 05:00
Core Insights - The article discusses the advancements in humanoid robots, particularly focusing on the open-source foundation model Ψ₀ developed by a team from the University of Southern California, which significantly outperforms NVIDIA's latest model GR00T N1.6 by over 40% in task success rates [2][4][33]. Group 1: Introduction and Background - Humanoid robot locomotion and manipulation is a challenging area in embodied intelligence, with recent models showing improved generalization capabilities but relying heavily on large-scale remote operation data, which is costly to collect [4]. - The introduction of egocentric human videos as a scalable alternative for training humanoid robots is highlighted, as these videos are rich in information and easier to obtain [5]. Group 2: Model Development - The Ψ₀ model requires only 80 real machine remote operation data points to master long-range locomotion tasks, demonstrating a significant efficiency in data utilization [5][29]. - The model employs a three-phase training paradigm: pre-training on human videos, post-training on real machine data for precise control, and fine-tuning with minimal data for specific tasks [9][10][13][16]. Group 3: Model Architecture - Ψ₀'s architecture is designed with a "decoupled" approach, separating visual understanding, action generation, and low-level motion control into three collaborative modules: a visual language model, an action expert, and a reinforcement learning controller [20][21][24]. - The visual language model serves as the "brain" of the system, while the action expert predicts full-body action sequences based on visual-language features [22]. Group 4: Model Deployment and Performance - The Ψ₀ model incorporates a real-time action chunking mechanism to ensure smooth execution of tasks, addressing the common issue of inference delays in large models [25][27]. - In practical evaluations, Ψ₀ demonstrated superior performance across eight long-duration tasks, achieving an average success rate significantly higher than baseline models, particularly excelling in tasks requiring fine motor skills [28][33]. Group 5: Conclusion and Future Directions - The results emphasize that effective scaling is not merely about accumulating data but rather about using the right data in the correct manner, combining high-quality human operation data with domain-specific real machine trajectories [40]. - The article suggests that future advancements in robot memory, dexterous hands, and multimodal perception will enhance robots' understanding, learning, and adaptability, paving the way for more capable humanoid robots in everyday life [40].