具身原生模型 - filings, earnings calls, financial reports, news

具身原生模型

Search documents

机器之心· 2026-03-11 03:51AI Processing

当前，大语言模型（LLMs）和视觉语言模型（VLMs）在语义领域的成功未能直接迁移至物理机器人，归根结底在于其互联网原生的基因。主流的 "预训练 - 后适配"（Pretrain-then-Adapt）的范式依赖互联网静态数据，导致模型先天缺失物理基础（Physical Grounding），在落地时往往顾此失彼：要么导致操作与导航的模块割裂，要么引发灾难性遗忘，在追求控制精度的过程中丢失了核心的通用推理能力。为了打破这一局限，原力灵机联合阶跃星辰提出一种名为 DM0 的具身原生（Embodied-Native） VLA 模型，其工作核心在于「从 0 开始」：从训练的最初阶段，就采用统一的视角，将具身传感器与运动数据视为与语言、视觉数据同等重要的一等公民。作为一个端到端模型，DM0 可以无缝统一机器人的精细操作（Manipulation）与移动导航（Navigation）。在 RoboChallenge 真实世界基准测试 Table 30 中，DM0 在单任务（Specialist）和多任务（Generalist）两种设置下均以显著优势领先现有 SOTA 模型，展现出极其强大的物理世界泛化与执行能力。 ...

原力灵机具身大模型DM0硬核拆解：物理AI如何迎来自己的“原生”时代

AI科技大本营· 2026-02-28 03:27

Core Insights - The article discusses the limitations of current large language models (LLMs) and vision-language models (VLMs) in physical robotics, emphasizing the need for a new approach that integrates physical grounding from the outset [1][2] - The DM0 model, developed by Yuanliang and Jie, is introduced as an embodied-native vision-language-action model that combines various data sources to enhance physical interaction capabilities [3][5] Model Architecture and Training - DM0 employs a multi-source mixed training approach and an embodied spatial scaffolding architecture to harmonize heterogeneous data, including internet corpora, autonomous driving logs, and robotic operation trajectories [5][8] - The model consists of two main components: a VLM backbone for multimodal perception and a flow-matching-based action expert for continuous control [12][13] - The training pipeline is divided into three stages: pre-training with 1.13 trillion tokens, mid-training with 200 million samples, and post-training with 50 million samples, focusing on aligning the model with specific robotic platforms [16][17][18][19] Performance Evaluation - DM0 demonstrated superior performance in the RoboChallenge benchmark, achieving a 62.00% average success rate in single-task evaluations, outperforming larger models like Spirit-v1.5 and GigaBrain-0.1 [24] - In multi-task evaluations, DM0 achieved a 37.3% average success rate and a task score of 49.08, significantly surpassing the previous best model, pi0.5 [27] Future Directions - The authors suggest potential future developments for DM0, including scaling the model to 7B or 30B parameters, integrating multimodal sensory feedback, and enhancing long-term reasoning capabilities [32]