具身原生模型
Search documents
物理AI的「原生」时刻:原力灵机发布具身大模型DM0
机器之心· 2026-03-11 03:51AI Processing
当前,大语言模型(LLMs)和视觉语言模型(VLMs)在语义领域的成功未能直接迁移至物理机器人,归根结底在于其互联网原生的基因。主流的 "预训练 - 后适配"(Pretrain-then-Adapt)的范式依赖互联网静态数据,导致模型先天缺失物理基础(Physical Grounding),在落地时往往顾此失彼:要么导 致操作与导航的模块割裂,要么引发灾难性遗忘,在追求控制精度的过程中丢失了核心的通用推理能力。 为了打破这一局限,原力灵机联合阶跃星辰提出一种名为 DM0 的具身原生(Embodied-Native) VLA 模型,其工作核心在于「从 0 开始」:从训练的 最初阶段,就采用统一的视角,将具身传感器与运动数据视为与语言、视觉数据同等重要的一等公民。 作为一个端到端模型,DM0 可以无缝统一机器人的精细操作(Manipulation)与移动导航(Navigation)。在 RoboChallenge 真实世界基准测试 Table 30 中,DM0 在单任务(Specialist)和多任务(Generalist)两种设置下均以显著优势领先现有 SOTA 模型,展现出极其强大的物理世界泛化与执行能 力。 ...
原力灵机具身大模型DM0硬核拆解:物理AI如何迎来自己的“原生”时代
AI科技大本营· 2026-02-28 03:27
Core Insights - The article discusses the limitations of current large language models (LLMs) and vision-language models (VLMs) in physical robotics, emphasizing the need for a new approach that integrates physical grounding from the outset [1][2] - The DM0 model, developed by Yuanliang and Jie, is introduced as an embodied-native vision-language-action model that combines various data sources to enhance physical interaction capabilities [3][5] Model Architecture and Training - DM0 employs a multi-source mixed training approach and an embodied spatial scaffolding architecture to harmonize heterogeneous data, including internet corpora, autonomous driving logs, and robotic operation trajectories [5][8] - The model consists of two main components: a VLM backbone for multimodal perception and a flow-matching-based action expert for continuous control [12][13] - The training pipeline is divided into three stages: pre-training with 1.13 trillion tokens, mid-training with 200 million samples, and post-training with 50 million samples, focusing on aligning the model with specific robotic platforms [16][17][18][19] Performance Evaluation - DM0 demonstrated superior performance in the RoboChallenge benchmark, achieving a 62.00% average success rate in single-task evaluations, outperforming larger models like Spirit-v1.5 and GigaBrain-0.1 [24] - In multi-task evaluations, DM0 achieved a 37.3% average success rate and a task score of 49.08, significantly surpassing the previous best model, pi0.5 [27] Future Directions - The authors suggest potential future developments for DM0, including scaling the model to 7B or 30B parameters, integrating multimodal sensory feedback, and enhancing long-term reasoning capabilities [32]