Workflow
具身智能之心
icon
Search documents
深扒PI π*0.6迭代式强化学习思路:VLA+在线RL,实现自我进化
具身智能之心· 2025-12-07 03:03
Core Insights - The article discusses the advancements in embodied intelligence, particularly focusing on the VLA (Vision-Language-Action) model and its integration with reinforcement learning (RL) to enhance robotic capabilities [2][3][4]. Group 1: Importance of VLA and RL - VLA models are crucial in embodied AI as they apply powerful vision-language models to robot control, but mere imitation learning is insufficient for robust performance in novel situations [6][9]. - Online RL allows robots to discover better solutions through trial and error, overcoming the limitations of offline RL which is constrained by the quality of demonstration data [9][10]. Group 2: Challenges in Applying RL to VLA - The application of RL in VLA faces three main challenges: environmental differences, model instability, and computational demands [22]. - Directly applying RL to large VLA models can lead to catastrophic forgetting and training collapse, making it difficult to maintain performance [22][23]. Group 3: iRe-VLA Model and Its Innovations - The iRe-VLA model introduces a two-phase iterative learning process that combines exploration and consolidation of learned behaviors [18][25]. - The first phase involves online RL where the robot explores new tasks while keeping the VLM parameters frozen, focusing on training a lightweight action head [30][32]. - The second phase employs supervised learning to internalize successful trajectories discovered during exploration, allowing the model to leverage its full capacity [40][43]. Group 4: Experimental Results and Effectiveness - Experiments in both simulated environments and real-world scenarios demonstrate that iRe-VLA significantly improves task success rates compared to traditional methods [45][49]. - The model shows a marked increase in performance, with success rates rising from 43% to 83% in benchmark tasks, and from 35% to 80% in real-world object manipulation tasks [49][56]. Group 5: Conclusion and Future Directions - The article concludes that the iRe-VLA framework effectively addresses the challenges of deploying large models in robotic control, paving the way for future research in efficient exploration and stable RL algorithms [61][63]. - The approach balances computational efficiency by distributing lightweight tasks to local robots while reserving heavy computations for cloud servers, facilitating practical deployment [65].
具身智能的黄埔军校,都有哪些东西?
具身智能之心· 2025-12-07 03:03
Core Insights - The article discusses the development and research in the field of embodied intelligence, highlighting key modules such as industry content, embodiment forms, algorithms, and deployment solutions [1] Industry Overview - Companies engaged in the development of embodied brains and embodiments have been identified, along with active laboratories in the field [1] - A variety of industry research reports have been provided to assess the development and cycles of embodied intelligence [1] Product Recommendations - Recommended products for research include the SO-100 series, Openarm series, and XLerobot series, with SO-100 capable of running various VA and VLA algorithms [2][4] - Openarm is a dual-arm task framework suitable for basic tasks like folding clothes and pick-and-place operations, though it lacks mobility [4] - XLerobot has limited mobility and is suitable for entry-level research and personal development tasks [6] Algorithm Development - The article outlines several algorithmic directions, including VLA (training methods, reinforcement learning, lightweight deployment) and VLN (time language, target navigation) [8] - Control strategies such as reinforcement learning, Model Predictive Control (MPC), and simulation techniques are also discussed [8] Deployment Strategies - Most deployments are currently focused on cloud-based inference, with some edge solutions based on the Sol framework being implemented [8] - Companies like Xiaopeng have completed deployments of VLM/VLA using self-developed chips [8] Community Engagement - The community has organized continuous live sharing sessions and roundtable forums to discuss various aspects of the embodied intelligence industry [10][9] - A comprehensive technical roadmap has been created for beginners, including various learning paths and resources [16] Job Opportunities and Networking - The community has established a job referral mechanism with several embodied intelligence companies, facilitating connections between job seekers and employers [20] - Members can engage with industry experts and receive guidance on career-related questions [20] Resource Compilation - The community has compiled a wealth of resources, including over 40 open-source projects, 60 datasets related to embodied intelligence, and various technical learning routes [22] - A summary of notable laboratories and companies in the field has been provided for reference in academic and professional pursuits [24][26]
已经有7所高校,在悄悄地设立具身专业了......
具身智能之心· 2025-12-06 03:11
点击下方 卡片 ,关注 "红岸" 公众号 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 前两天分享了清华具身研究院和上交的具身专业开设,除了这两所,还有另外6所双一流高校正在申请增 设"具身智能本科专业"。以下为教育部公示的名单。 | 学校名称 | 专业名称 | 学位授予门类 | 申报类型 | 申请表 | | --- | --- | --- | --- | --- | | 北京航空航天大学 | 具身智能 | 工学 | 尚未列入目录的新专业 | 下载 | | 北京理工大学 | 具身智能 | 工学 | 尚未列入目录的新专业 | 下载 | | 北京邮电大学 | 具身智能 | 工学 | 尚未列入目录的新专业 | 下载 | | 东北大学 | 具身智能 | 工学 | 尚未列入目录的新专业 | 下載 | | 上海交通大学 | 具身智能 | 工学 | 尚未列入目录的新专业 | 下载 | | 浙江大学 | 具身智能 | 工学 | 尚未列入目录的新专业 | 下载 | | 西安交通大学 | 具身智能 | 工学 | 尚未列 ...
字节前技术负责人联手清华姚班校友创业!
具身智能之心· 2025-12-05 16:02
Core Insights - The article discusses the evolution of AI programming from "Vibe Coding" to a more structured "Engineering Era" defined by the InfCode coding agent developed by a startup team from Tsinghua University [9][11]. Group 1: Vibe Coding and Its Limitations - Vibe Coding allows developers to generate runnable code from simple prompts, creating a magical programming experience [3][5]. - However, it struggles with complex enterprise-level projects due to limitations in context window, reasoning depth, and the absence of an Agentic model, making it difficult to locate bugs in large codebases [5][11]. Group 2: InfCode's Breakthrough - InfCode, developed by the startup "Ciyuan Wuxian," has achieved top scores in two authoritative AI coding benchmarks: SWE-Bench Verified and Multi-SWE-bench-CPP [6][14]. - InfCode scored 79.4% in the SWE-Bench Verified benchmark and 25.58% in the C++ subset of Multi-SWE-bench, significantly outperforming competitors like Claude 3.7 Sonnet and DeepSeek V3 [7][15]. Group 3: Technical Innovations of InfCode - InfCode incorporates a multi-agent system designed for enterprise scenarios, marking a shift from individual efficiency to organizational evolution [8][11]. - The system features a "Code Intent Analysis" mechanism that allows it to understand the functional intent behind natural language descriptions, improving its ability to locate issues in large codebases [21][20]. - It utilizes an AST-based structured retrieval engine to enhance code search accuracy, overcoming limitations of traditional text search tools [25][22]. Group 4: Dual-Agent Architecture - InfCode employs a novel dual-agent architecture that iteratively generates and tests code patches, enhancing robustness and completeness [30][31]. - This approach allows for continuous improvement of patches, making them suitable for integration into production environments [31][32]. Group 5: Team and Vision - The team behind InfCode is described as a "startup dream team," combining technical expertise with productization and commercialization capabilities [42][44]. - The vision is to transform the AI coding landscape from mere tool efficiency to a comprehensive reconstruction of the software engineering lifecycle, aiming to create a "digital employee" platform [44].
对话多个行业大佬!VLA与RL方案在真机上的部署怎么样啦?
具身智能之心· 2025-12-05 16:02
Core Viewpoint - The article discusses the implementation challenges and advancements of VLA (Variable Latent Action) algorithms and Reinforcement Learning (RL) in robotics, focusing on their practical applications and future developments in the field of embodied intelligence [3][13]. Group 1: Guest Speakers - Wei Sui, Vice President of Diguo Robotics, has extensive experience in developing 2.5D and 3D vision algorithms for robotics and autonomous driving, leading a team that created a comprehensive 4D labeling system, with millions of chips shipped [5]. - Zhang Qiang, Chief Researcher and Academic Committee Director at Beijing Humanoid Robotics, specializes in humanoid robot motion control and multimodal perception, contributing to the development of core RL algorithms for humanoid robots [6][8]. - Wang Tiancai, Partner at Yuanli Lingji, has published over 30 papers in top international conferences and is a core author of notable algorithms in end-to-end autonomous driving [9][10]. - Yu Chao, Assistant Professor at Tsinghua Shenzhen Research Institute, focuses on decision intelligence driven by reinforcement learning, with over 50 published papers and significant academic recognition [11][12]. Group 2: Key Topics Discussed - The article addresses the pain points in the architecture and models of VLA, exploring how to enhance the overall motion control of robots [16]. - It discusses the integration of VLA with RL for better real-world application, including considerations for hardware selection and lightweight implementations [16].
最新分层VLA模型:使用失败的演示数据,也能优化VLA模型!
具身智能之心· 2025-12-05 16:02
作者丨 Jeonguen Park等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 研究背景与核心问题 现有VLA模型的局限 视觉-语言-动作(VLA)模型是机器人操作任务的核心技术,传统模型依赖人类遥控收集的成功演示数据训练,但数据采集过程中自然产生的大量失败尝试(如抓 取不稳定、碰撞等)常被当作噪声丢弃。这些失败数据蕴含着政策脆弱点的关键信息——揭示了哪些动作序列不可行、哪些场景下容易出错,而单纯依赖成功数据 的模型难以应对复杂环境中的不确定性,在未见过的场景中鲁棒性大幅下降。 核心挑战与研究目标 核心挑战在于如何有效整合离线数据中的失败信号:模仿学习(IL)中直接惩罚易失败动作容易扭曲政策,而强化学习(RL)虽能通过奖励信号自然处理失败数 据,但需要合适的框架承载。研究目标是构建一个分层VLA模型,将失败经验转化为结构化学习信号,通过显式的规划机制实现"失败感知推理",在不改变机器人 核 ...
SpatialActor:解耦语义与几何,为具身智能注入强鲁棒空间基因
具身智能之心· 2025-12-05 16:02
Core Insights - The article discusses the development of SpatialActor, a robust spatial representation framework for robotic manipulation, which addresses challenges related to precise spatial understanding, sensor noise, and effective interaction [21][24] - SpatialActor separates semantic information from geometric information, enhancing the robot's ability to understand tasks and accurately perceive its environment [21][6] Methodology and Architecture - SpatialActor employs a "dual-stream disentanglement and fusion" architecture, integrating semantic understanding from visual language models (VLM) and precise geometric control from 3D representations [6][21] - The architecture includes independent visual and depth encoders, with a Semantic-Guided Geometry Module (SGM) that adaptively fuses robust geometric priors with fine-grained depth features [9][10] - A Spatial Transformer (SPT) establishes accurate 2D to 3D mappings and integrates multi-modal features, crucial for generating precise actions [12][9] Performance Evaluation - In simulations, SpatialActor achieved an average success rate of 87.4%, outperforming the previous state-of-the-art model RVT-2 by 6.0% [13][19] - The model demonstrated significant robustness against noise, with performance improvements of 13.9% to 19.4% across different noise levels compared to RVT-2 [14][19] - Real-world experiments showed SpatialActor consistently outperforming RVT-2 by approximately 20% across various tasks, confirming its effectiveness in complex environments [19][18] Conclusion - The results highlight the importance of disentangled spatial representations in developing more robust and generalizable robotic systems, with SpatialActor showing superior performance in diverse conditions [21][20]
人大等团队提出Mixture of Horizons策略,解决VLA的“长短视”问题
具身智能之心· 2025-12-05 04:00
更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 本文对VLA领域最新研究《Mixture of Horizons in Action Chunking》进行系统解读。 该工作由中国人民大学、北卡罗来纳大学以及香港中文大学的研究团队联合完 成,深入分析了被广泛采用的动作分块(Action Chunking)策略,提出了即插即用的Mixture of Horizons策略以缓解Trade-off问题。 并且更进一步提出基于跨视野一致性的动态推理(Dynamic Inference)策略提高VLA模型的推理效率。该工作在LIBERO Benchmark上取得了99%平均准确率的新 SOTA。代码和模型均已开源! 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 图 1: 动作块长度对 的影响。 然而,本文通过实验发现,单一动作块长度在 长期轨迹规划 以及 短期动作精度 之间存在明显的 Trade-off 。 论文链接 : https://arxiv. ...
RoCo Challenge @ AAAI 2026 面向机器人组装的具身智能国际竞赛
具身智能之心· 2025-12-05 04:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 我们诚邀您参加AAAI 2026期间举办的前沿机器人协作赛事 RoCo Challenge。本赛事由 Nanyang Technological University (NTU) 感知与具身智能实验室 (PINE Lab)、 A*STAR、 Carnegie Mellon University (CMU) 等机构联合主办,聚焦具身智能与人机协作的核心议题,旨在推动机器人在复杂生产与操作环境中的自主决策、协同规划与安全交互能力的 研究与落地。本届赛事设置了多个赛道,覆盖从虚拟仿真环境中的人机协作任务规划到真实机器人平台的多模态操作执行等环节。参赛队伍将面对真实工业与服务场景下的开 放式挑战,通过多阶段任务展现智能体在理解、沟通与行动层面的综合能力。 为鼓励创新与跨界合作,赛事将提供2000美元奖金与奖项证书,并为获胜队伍提供方案展示及在AAAI 20 ...
复刻pi0.6很难?SRPO:无需微调 Value Model,VLA-RL 也能刷新 SOTA
具身智能之心· 2025-12-05 00:02
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Senyu Fei等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 1 写在前面 在具身智能领域,强化学习 (RL) 正成为继有监督微调 (SFT) 之后提升视觉-语言-动作 (VLA) 模型表现的关键。最近 Physical Intelligence 发布的 利用 RECAP 框架证明了这一路径的潜力。然而,构建高质量的奖励或价值模型通常代价高昂。 图 1: 与 SRPO 价值函数曲线。图中三个场景取自 官方主页,白色曲线代表 的价值函数,而黄色曲线代表 SRPO 方法未经任务微调直接得到的价值 函数。在 中,该价值函数预测的是完成任务所需的负向步骤数,当机器人取得进展时,预测值会上升,而当进展甚微时,预测值则保持平稳;在SRPO 中则 直接预测任务的进展。 近期,OpenMOSS 团队与 SiiRL 团队联合带来最新工作 SRPO (Self-Referential Policy ...