π₀

Search documents
基于大型VLM的VLA模型如何改一步一步推动机器人操作任务的发展?
具身智能之心· 2025-08-26 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 当机器人 "看懂" 指令还能 "自主干活":大型 VLM 如何改写机器人操作的游戏规则? 你是否想象过这样的场景:对着机器人说一句 "把阳台晾干的衬衫叠好放进衣柜第三层",它就能看懂衣物位置、理解 "叠好""放进" 的动作逻辑,甚至避开衣柜里的 杂物完成任务?放在几年前,这更像科幻电影里的情节 —— 传统机器人要么困在 "预定义任务牢笼" 里,换个新杯子就认不出;要么面对模糊的自然语言指令 "手 足无措",更别提在杂乱的真实环境里灵活调整动作。 但现在,一场由 "视觉 - 语言 - 动作(VLA)模型" 掀起的变革,正在打破这些局限。而这场变革的核心推手,正是我们如今耳熟能详的大型视觉语言模型 (VLM)。 过去,机器人操作的研究总在 "模块化陷阱" 里打转:视觉识别、语言解析、动作控制各成一派,像被割裂的齿轮,很难协同运转。直到大型 VLMs 的 ...
Physical Intelligence 核心技术团队分享:物理世界的“Vibe Coding”如何实现?
海外独角兽· 2025-08-23 12:04
编译:shiling、haozhen 编辑:Siqi 那么,从技术视角看,VLA 与 LLM、VLM 之间究竟是什么关系?为了实现通用机器人大脑,PI 是 怎么从零构建数据管线的?PI 新提出的"知识绝缘(Knowledge Insulation)"机制到底又是如何运行 的? 这篇文章是 Physical Intelligence 核心技术团队对机器人过去和当下技术路径的解读,并分享了 PI 在 数据采集、算法设计以及 multi-robot 通用模型领域的前沿技术探索: • VLM 在 LLM 基础上拓展了视觉感知能力,而 VLA 是 VLM 在机器人领域的进一步应用; 通用机器人是 AGI 从数字世界走向物理世界的重要路径,而在 AI robotcis 这个主题下,Physical Intelligence 无疑是最具技术深度和研究影响力的团队之一。今年 4 月,PI 以 π₀ 为基础,新发布了一 个在开放世界具有泛化能力的 VLA 模型 π₀.₅,PI 表示 π₀.₅ 在一些未知环境中,依然能够展现出与在 原始训练环境相近的表现。 • PI 团队几乎从头搭建了整个数据引擎,而且通过实验,PI 证明了提高 ...
VLA爆发!从美国RT-2到中国FiS-VLA,机器人的终极进化
具身智能之心· 2025-07-09 14:38
Core Viewpoint - The article emphasizes the rapid evolution and significance of Vision-Language-Action (VLA) models in the field of embodied intelligence, highlighting their potential to revolutionize human-robot interaction and the robotics industry as a whole [4][6][17]. Group 1: VLA Model Development - VLA models are becoming the core driving force in embodied intelligence, gaining traction among researchers and companies globally [7][8]. - Google recently released the first offline VLA model, enabling robots to perform tasks without internet connectivity [9]. - The emergence of the Fast-in-Slow (FiS-VLA) model in China represents a significant advancement, integrating fast and slow systems to enhance robotic control efficiency and reasoning capabilities [10][12]. Group 2: Academic and Industry Trends - There has been an explosive growth in academic papers related to VLA, with 1,390 papers published this year alone, accounting for nearly half of all related research [14]. - The VLA technology is facilitating the transition of robots from laboratory settings to real-world applications, indicating its vast potential [16][17]. Group 3: Key Innovations and Breakthroughs - The RT-2 model from Google marked a pivotal moment in VLA development, introducing a unified model architecture that integrates visual, language, and action modalities [38][40]. - The RoboMamba model, developed in China, significantly improved efficiency and reasoning capabilities in VLA models, achieving a threefold increase in inference speed compared to mainstream models [52][48]. - OpenVLA, another significant model, demonstrated superior performance in various tasks while being more efficient than previous models, achieving a 16.5% higher success rate than RT-2 [57][58]. Group 4: Future Directions and Implications - The introduction of the π series models aims to enhance VLA's generalization capabilities, allowing robots to perform complex tasks with minimal training [62][70]. - The FiS-VLA model represents a breakthrough in real-time control, achieving an 11% improvement in success rates in real environments compared to existing methods [114]. - The advancements in VLA technology are paving the way for robots to operate effectively in diverse environments, marking a significant step towards achieving Artificial General Intelligence (AGI) [127][123].