视觉 - 语言 - 动作 (VLA) 模型
Search documents
复刻pi0.6很难?SRPO:无需微调 Value Model,VLA-RL 也能刷新 SOTA
具身智能之心· 2025-12-05 00:02
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Senyu Fei等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 1 写在前面 在具身智能领域,强化学习 (RL) 正成为继有监督微调 (SFT) 之后提升视觉-语言-动作 (VLA) 模型表现的关键。最近 Physical Intelligence 发布的 利用 RECAP 框架证明了这一路径的潜力。然而,构建高质量的奖励或价值模型通常代价高昂。 图 1: 与 SRPO 价值函数曲线。图中三个场景取自 官方主页,白色曲线代表 的价值函数,而黄色曲线代表 SRPO 方法未经任务微调直接得到的价值 函数。在 中,该价值函数预测的是完成任务所需的负向步骤数,当机器人取得进展时,预测值会上升,而当进展甚微时,预测值则保持平稳;在SRPO 中则 直接预测任务的进展。 近期,OpenMOSS 团队与 SiiRL 团队联合带来最新工作 SRPO (Self-Referential Policy ...
港科广&清华联合提出Spatial Forcing:隐式空间对齐,超越主流2D/3D VLA模型性能
具身智能之心· 2025-10-18 16:03
Core Insights - The article discusses the limitations of current Vision-Language-Action (VLA) models that primarily rely on 2D visual data, lacking a deep understanding of real 3D space, which hampers their ability to perform tasks in the physical world [2][4] - The proposed method, Spatial Forcing (SF), allows VLA models to develop spatial understanding without explicit 3D input by aligning visual features with a powerful 3D geometric representation generated by an external model [2][10] Methodology - The SF method employs an implicit spatial alignment strategy, enabling the model to autonomously acquire spatial understanding during training without the need for additional 3D sensors [2][13] - A depth probing experiment was conducted to verify the presence of 3D information in the original VLA's visual features, revealing that without 3D input, the model cannot form accurate spatial perceptions [11][13] - The training process involves aligning the VLA model's visual tokens with pixel-level spatial representations extracted from a pre-trained 3D model, optimizing both spatial alignment loss and action generation loss [16] Performance Results - The SF method significantly outperforms existing 2D and 3D VLA models in various tasks, achieving a training efficiency improvement of up to 3.8 times and a data utilization efficiency increase of up to 5.9 times [14] - In experiments, the Spatial Forcing model achieved a success rate of 99.4% in spatial tasks, 99.6% in object tasks, and 98.8% in goal tasks, demonstrating its superior performance compared to other models [18]