Workflow
DM0
icon
Search documents
原力灵机具身大模型DM0硬核拆解:物理AI如何迎来自己的“原生”时代
AI科技大本营· 2026-02-28 03:27
Core Insights - The article discusses the limitations of current large language models (LLMs) and vision-language models (VLMs) in physical robotics, emphasizing the need for a new approach that integrates physical grounding from the outset [1][2] - The DM0 model, developed by Yuanliang and Jie, is introduced as an embodied-native vision-language-action model that combines various data sources to enhance physical interaction capabilities [3][5] Model Architecture and Training - DM0 employs a multi-source mixed training approach and an embodied spatial scaffolding architecture to harmonize heterogeneous data, including internet corpora, autonomous driving logs, and robotic operation trajectories [5][8] - The model consists of two main components: a VLM backbone for multimodal perception and a flow-matching-based action expert for continuous control [12][13] - The training pipeline is divided into three stages: pre-training with 1.13 trillion tokens, mid-training with 200 million samples, and post-training with 50 million samples, focusing on aligning the model with specific robotic platforms [16][17][18][19] Performance Evaluation - DM0 demonstrated superior performance in the RoboChallenge benchmark, achieving a 62.00% average success rate in single-task evaluations, outperforming larger models like Spirit-v1.5 and GigaBrain-0.1 [24] - In multi-task evaluations, DM0 achieved a 37.3% average success rate and a task score of 49.08, significantly surpassing the previous best model, pi0.5 [27] Future Directions - The authors suggest potential future developments for DM0, including scaling the model to 7B or 30B parameters, integrating multimodal sensory feedback, and enhancing long-term reasoning capabilities [32]
具身智能如何抵达 “ChatGPT时刻”?智源院长、清华教授和3位创始人聊了聊
3 6 Ke· 2026-02-13 10:50
Core Insights - The industry is awaiting a "ChatGPT moment" for embodied intelligence, but there is no consensus on its definition [1][10] - The discussion at the forum highlighted the challenges of achieving zero-shot generalization in embodied AI compared to language models [2][10] - A more achievable goal is to first solve specific scenarios and gather real machine data to improve models and systems [3][12] Group 1: Challenges and Development Directions - Embodied intelligence faces significant commercialization challenges due to its longer supply chain and the need for real machine data [2][11] - Current embodied models are still in development, with a notable gap between existing capabilities and large-scale applications [5][11] - The focus should be on solving specific tasks in controlled environments to create a data feedback loop for model improvement [3][6] Group 2: Industry Perspectives and Comparisons - China is seen as having a strong investment in embodied intelligence, potentially outpacing the U.S. in certain aspects due to its complete industrial chain [6][8] - The collaboration between academia and industry is increasing, which may lead to faster advancements in embodied intelligence [8][9] - The U.S. has made early investments in models and data, but China is catching up in practical applications [6][8] Group 3: Future Expectations and Predictions - The year 2026 is anticipated to be transformative for embodied intelligence, with expectations for significant advancements in applications and supply chains [12][24] - There is a desire for a unified standard in hardware, data, and model outputs to facilitate industry growth [23][24] - Achieving a reliable and useful embodied intelligence that can operate in specific scenarios is seen as a critical milestone [12][25]
对话原力灵机周而进:模型2.4B就够用,关键是“具身原生”;能闭环才是最高效方法
量子位· 2026-02-13 05:42
Core Viewpoint - The company has introduced a lightweight embodiment model DM0 with 2.4 billion parameters, claiming it is sufficient for real-time processing and capable of continuous evolution through reinforcement learning [1][5][4]. Group 1: Model Specifications - DM0 is designed to handle three perspectives of 728x728 images with a reasoning delay of only 60 milliseconds [4]. - The model is considered the first "embodiment native large model" due to its unique training approach from scratch, differing from industry norms [7][18]. - The model's training process consists of three phases: VLM Train, VLA Pre-Train, and VLA Post-Train, focusing on multi-source and multi-task training [26][29][30]. Group 2: Technical Framework - Alongside DM0, the company released an open-source framework Dexbotic 2.0 and a production workflow DFOL, aimed at enhancing embodied applications [8][97]. - Dexbotic 2.0 is designed to unify embodied operations and navigation, allowing for modular architecture [98][100]. - DFOL aims to bridge the gap between traditional automation and human-like flexibility, focusing on efficiency and adaptability [101]. Group 3: Data Collection and Training Philosophy - The company emphasizes a "from zero" training approach, arguing that early exposure to physical world interactions is crucial for model understanding [40][42]. - Data collection is comprehensive, involving internet data, intelligent driving data, and embodied data, with a focus on high-resolution inputs for precise actions [62][64][66]. - The data collection strategy is dynamic, adjusting based on experimental results to ensure effective model training [68][70]. Group 4: Application and Market Strategy - The company is initially focusing on logistics as a practical application for embodied intelligence, aiming to refine capabilities in a controlled environment [125][146]. - The logistics scenario is chosen for its scalability and replicability, allowing for rapid data feedback loops to enhance model performance [149][150]. - Future plans include expanding from logistics to more complex environments, ultimately targeting consumer applications [155][156]. Group 5: Long-term Vision - The ultimate goal is to develop robots with broad social identities, capable of independent transactions and interactions in various environments [168][171]. - The company believes that achieving this vision requires a phased approach, ensuring reliability in hardware and model capabilities before expanding to more complex tasks [169][172].
雷军宣布初代小米SU7停产;传百度秘密启动“O计划”
Group 1: Company Developments - Xiaomi's founder Lei Jun announced the discontinuation of the first-generation Xiaomi SU7, with nearly 370,000 units delivered [2] - Baidu has reportedly initiated a secret project called "O Plan," which is related to the Baidu APP and aims to enhance its AI capabilities, with the app's monthly active users surpassing 200 million [3] - Zhizhu's stock surged nearly 200% after the announcement of a new model, speculated to be GLM-5, which has generated significant interest in the developer community [4] Group 2: New Product Launches - Alibaba's Qianwen launched a new image generation model, Qwen-Image-2.0, with API access available for developers [7] - ByteDance introduced the Seedream 5.0 Preview model, which is now available for testing in various applications, including video editing [8] - Tencent released a small model, HY-1.8B-2Bit, which occupies only about 600MB of storage, marking a breakthrough in edge deployment [16] Group 3: Financial Updates - Honda reported a third-quarter operating profit of 153.36 billion yen, exceeding expectations, and has developed a plan to prevent future chip supply shortages [11] - "Qingche Intelligent" completed a multi-hundred million yuan Series A financing round, focusing on the development of large models for robotics [13] - "Daxiao Robotics" has recently completed an angel round of financing led by Ant Group, with participation from several other investment firms [14] Group 4: Industry Trends - The trend of AI model development continues to accelerate, with multiple companies launching new models and enhancing existing ones to meet market demands [4][7][8][16] - The competition in the AI space is intensifying, as companies like Baidu and ByteDance focus on integrating AI capabilities into their existing platforms [3][5]
「具身原生」元年!专访原力灵机汪天才,解析具身智能的「PyTorch时刻」
机器之心· 2026-02-10 08:52
Core Viewpoint - The article discusses the significant advancements in embodied intelligence, particularly through the launch of the Dexbotic 2.0 framework and its collaboration with RLinf, marking a pivotal moment in the industry towards a "native embodied" era of AI [3][5][9]. Group 1: Framework and Collaboration - The Dexbotic 2.0 framework aims to standardize the infrastructure for embodied intelligence, similar to how PyTorch revolutionized deep learning [5][16]. - The collaboration with Tsinghua University and RLinf focuses on enhancing the capabilities of embodied AI through a unified framework that integrates perception, decision-making, and execution [3][5][19]. - The introduction of the DM0 model and the DFOL workflow signifies a comprehensive approach to developing and deploying embodied applications [6][51]. Group 2: Embodied Native Concept - "Embodied Native" is defined as a concept that emphasizes a closed-loop system of perception, decision-making, and execution, allowing AI to interact with the physical world effectively [15][13]. - The framework promotes the use of real-world data and multi-modal training to enhance the model's understanding and interaction with its environment [17][41]. - The transition from a "big model brain + mechanical limbs" approach to a fully integrated embodied system is highlighted as a key evolution in the field [12][13]. Group 3: Technical Innovations - Dexbotic 2.0 features a modular design that maintains high flexibility while ensuring end-to-end processing, allowing for independent upgrades of perception, cognition, and control modules [21][33]. - The framework integrates various models and capabilities, including visual-language-action (VLA) and navigation, to achieve comprehensive task execution [37][38]. - The introduction of a standardized data format (Dexdata) and a unified training pipeline addresses the fragmentation in the development of embodied intelligence [45][46]. Group 4: Performance and Evaluation - The DM0 model, with 2.4 billion parameters, has achieved high performance in real-world evaluations, demonstrating its capability in both single and multi-task scenarios [57][58]. - The RoboChallenge benchmark is established to provide a fair evaluation of embodied models, ensuring that performance metrics reflect true capabilities rather than optimized scores [46][57]. - The DFOL workflow enables continuous improvement of robotic systems through real-time data feedback, enhancing their operational efficiency [62][65]. Group 5: Future Insights - The article emphasizes the importance of integrating multi-modal sensory inputs, such as touch and auditory capabilities, to enhance the modeling of the physical world [74]. - The rapid evolution of embodied intelligence is noted, with expectations for significant advancements in the near future, akin to the pace seen in large model developments [73][75]. - The company advocates for an open-source approach to foster collaboration and innovation within the embodied intelligence community, aiming to lower barriers for developers [68][71].
全球首个具身原生大模型DM0发布,联合阶跃星辰训练
Xin Lang Cai Jing· 2026-02-10 06:44
Core Insights - The article highlights the launch of DM0, the world's first embodied native large model developed by Yuanli Lingji, which is trained from scratch and integrates multimodal internet information along with unique embodied scenario data such as driving behavior, robotic operations, and navigation [1] Group 1 - DM0 is trained from the ground up and incorporates data from various modalities, including driving behavior and robotic operations [1] - The model has undergone a pre-training phase that includes mixed tasks of system navigation, full-body control, and core task execution, covering eight significantly different machine types [1] - DM0 achieved first place in both single-task and multi-task evaluations in the RoboChallenge real machine assessment, currently ranking as the top model globally [1]