Workflow
TwinBrainVLA
icon
Search documents
弯道超车?国产具身,千小时人类数据激发智能涌现
机器之心· 2026-03-05 04:15
Core Insights - The article discusses the emergence of a new paradigm in robotics, focusing on "human-first perspective data" as a key to achieving advanced robotic intelligence, outperforming major players like NVIDIA by over 20% in various benchmarks [1][4][6]. Group 1: Human-First Perspective Data - The concept of "human-first perspective data" is gaining traction in Silicon Valley, with companies like Tesla and Generalist AI investing heavily in this data type to enhance robotic capabilities [3][7]. - NVIDIA's recent EgoScale framework demonstrates that increasing human demonstration data can significantly improve robotic dexterity, emphasizing the importance of human data over machine-generated data [4][6]. - Deep Intelligence, founded in 2025, is recognized as a pioneer in leveraging human-first perspective data to decode physical common sense, which is crucial for the advancement of embodied intelligence [11][8]. Group 2: Understanding Physical Common Sense - The article highlights the critical role of physical common sense in achieving true robotic intelligence, with Generalist AI labeling it as the "dark matter" of robotics [8][14]. - Current domestic discussions in embodied intelligence often overlook the significance of physical common sense, focusing instead on trajectory fitting from real or simulated data [17][18]. - Deep Intelligence's approach prioritizes "understanding first, action next," aiming to equip robots with a deep understanding of physical world operations before executing tasks [20][21]. Group 3: Technological Innovations - Deep Intelligence has developed a comprehensive technology stack that includes data, architecture, and algorithms to enhance the efficiency of learning from human-first perspective data [24][22]. - The company has created a translation pipeline, Egocentric2Embodiment, to convert human perspective videos into structured learning signals for robots, ensuring they understand the underlying physical interactions [25][34]. - The PhysBrain model, trained on human-first perspective data, has achieved a 67.4% success rate in tasks, outperforming competitors that relied on extensive machine trajectory data [27][29]. Group 4: Advanced Model Architectures - The TwinBrainVLA architecture allows for the simultaneous training of a "left brain" for general understanding and a "right brain" for specific robotic actions, preventing knowledge loss during optimization [31][32]. - The integration of various innovations has led to the development of PhysBrain1.0, which achieved a remarkable 79.8% success rate in testing, surpassing industry benchmarks [37][38]. - The model's ability to generalize across different tasks and platforms indicates a significant advancement in robotic intelligence, showcasing the potential for real-world applications [39][40]. Group 5: Future Directions - Deep Intelligence plans to scale human-first perspective data collection to a million hours by mid-2026, aiming to fully reveal the scaling laws of physical common sense [43][46]. - The company’s approach is expected to create a competitive edge that is difficult for others to replicate, as it emphasizes data efficiency and systematic modeling of physical common sense [42][46].