具身智能之心
Search documents
首款推理具身模型,谷歌DeepMind造!打破一机一训,零样本迁移
具身智能之心· 2025-09-28 01:05
Core Viewpoint - Google DeepMind has launched the Gemini Robotics 1.5 series, marking a significant advancement in robotics with the introduction of embodied reasoning capabilities, enabling robots to think before acting and perform complex tasks [3][5][10]. Group 1: Model Composition - The Gemini Robotics 1.5 series consists of two models: GR 1.5 for action execution and GR-ER 1.5 for enhanced reasoning capabilities [6][4]. - GR 1.5 is designed for executing multimodal tasks, while GR-ER 1.5 focuses on planning and understanding [6][23]. Group 2: Task Execution and Adaptability - The combination of GR 1.5 and GR-ER 1.5 allows robots to perform multi-step tasks, such as sorting clothes or packing luggage based on weather conditions [7][8][20]. - The models enable zero-shot cross-platform capability, allowing skills learned on one robot to be transferred to another without additional training [9][19]. Group 3: Reasoning and Planning - GR-ER 1.5 generates an internal dialogue to break down complex tasks into smaller steps before execution, enhancing robustness and interpretability [25][50]. - The models can self-correct during task execution, improving efficiency and safety in human environments [31][32]. Group 4: Motion Transfer Mechanism - The innovative Motion Transfer mechanism allows different robot platforms to share skills by mapping their motion trajectories to a unified action semantic space [46][52]. - This mechanism enhances task generalization and cross-robot migration capabilities, allowing robots to adapt to new environments effectively [47][53]. Group 5: Performance and Safety - In benchmark tests, GR 1.5 outperformed previous models in instruction generalization, action generalization, and task completion rates, achieving nearly 80% in long-sequence tasks [58][59]. - The models maintain high safety standards, demonstrating superior risk recognition and intervention capabilities, ensuring safe operation in human environments [61][62].
缺数据也能拿SOTA?清华&上海AI Lab破解机器人RL两大瓶颈
具身智能之心· 2025-09-27 01:33
Core Insights - The article discusses the development of SimpleVLA-RL, a new framework designed to enhance the training and generalization capabilities of Visual-Language-Action (VLA) models in robotics, addressing key limitations in existing training paradigms [4][14]. Group 1: Key Contributions of SimpleVLA-RL - SimpleVLA-RL effectively addresses three major bottlenecks in VLA model training: high data collection costs, insufficient generalization ability, and the need for large-scale demonstration data [6][11]. - The framework demonstrates state-of-the-art (SoTA) performance in standard benchmarks such as LIBERO and RoboTwin, achieving significant improvements in success rates even with limited data [6][21]. - In scenarios with single demonstration data, the average success rate of OpenVLA-OFT in LIBERO increased from 48.9% to 96.9%, and for long-sequence tasks, it improved from 17.3% to 91.7% [6][21]. Group 2: Training Mechanism and Innovations - The training mechanism includes interactive trajectory sampling, result reward modeling, and exploration enhancement, which collectively improve data efficiency and model performance [15][16][17]. - The result reward model simplifies the reward structure to binary outcomes (success or failure), allowing for better focus on training objectives and avoiding the complexities of process rewards [16][21]. - The exploration enhancement strategy encourages diverse exploration during training, preventing the model from converging to narrow solutions [17][19]. Group 3: Performance Metrics and Benchmark Results - SimpleVLA-RL achieved an average success rate of 99.1% in the LIBERO benchmark, with specific improvements in long-sequence tasks, where success rates increased by 12.0 percentage points [23]. - In RoboTwin1.0, the average success rate improved from 39.8% to 70.4%, with notable gains in specific tasks such as "Blocks Stack," which saw a 33.1 percentage point increase [25]. - The framework also demonstrated significant performance improvements in RoboTwin2.0, with average success rates rising from 38.3% to 68.8%, surpassing previous models [27]. Group 4: Real-World Application and Generalization - The model trained solely on simulation data showed enhanced adaptability to real-world tasks, with average success rates increasing from 17.5% to 38.5% in practical applications [30]. - The emergence of the "Pushcut" phenomenon indicates that the model can autonomously discover new strategies beyond human demonstrations, showcasing its potential for adaptive learning [32][34].
具身智能之心国庆&中秋双节福利来啦~
具身智能之心· 2025-09-27 01:33
Group 1 - The article promotes a series of discounts and offers related to embodied intelligence courses and services, running from September 24 to October 12 [1][4][6] - New members joining the knowledge community can enjoy a 30% discount, while existing members can renew at a 50% discount [1][4] - Various courses, including VLA, VLN, and Diffusion Policy, are available at a 20% discount [2] Group 2 - The Super Discount Card offers a 30% discount on all courses for one year [4][7] - One-on-one paper tutoring can provide a maximum discount of 5,000 yuan for a 1,000 yuan fee, while group tutoring (1v6) offers a 1,000 yuan reduction [4][7] - The article highlights various research hardware available, including reinforcement learning platforms and robotic arms [4][7]
ImaginationPolicy:迈向通用、精确、可靠的机器人操作端到端策略
具身智能之心· 2025-09-27 01:33
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Wei Gao等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 一、核心背景与问题提出 机器人端到端操作策略为实体智能体理解和交互世界提供了巨大潜力。与传统模块化流水线不同,端到端学习能缓解模块间信息损失、孤立优化目标导致的特征 错位等关键局限,但现有端到端神经网络(包括基于大视觉-语言-动作(VLA)模型的方法),在大规模实际部署中性能仍显不足——尤其是在可靠性、精度 上,甚至逊色于工程化成熟的传统模块化流水线,且在面对未见过的物体或不同机器人平台时,泛化能力短板更突出。 为填补"泛化潜力"与"实际性能需求"的差距,本研究提出一种以"可用性(affordance)"为核心的端到端机器人操作方案:将可用性定义为"任务相关、语义明确 的物体局部区域",并通过"任务特定的定向关键点"来具象化这一概念,最终形成"移动定向关键点链(Chain of Moving Oriented Keypoi ...
这个具身智能领域的黄埔军校,正在做这些事情......
具身智能之心· 2025-09-26 10:42
Core Viewpoint - The article emphasizes the development and enhancement of a community focused on embodied intelligence, aiming to provide resources, support, and networking opportunities for individuals interested in this field. Group 1: Community Development - The community is working on improving hardware solutions and addressing user feedback regarding product quality and pricing [2] - There is an ongoing effort to streamline community resources and reduce gaps in information, with plans to present improved content after the holiday [2] - The community has established a mechanism for job referrals, connecting members with potential employers in the embodied intelligence sector [5][13] Group 2: Educational Resources - The community has compiled over 30 technical routes for members, facilitating easier access to benchmarks, reviews, and learning paths [6] - Various forums and live sessions are organized to discuss advancements in the embodied intelligence industry, covering topics such as robot simulation and data collection [6] - A comprehensive list of open-source projects and datasets related to embodied intelligence is available, aiding members in their research and development efforts [29][35] Group 3: Networking and Collaboration - The community includes members from prestigious universities and leading companies in the field, fostering collaboration and knowledge sharing [13] - Members are encouraged to engage with industry leaders and peers to discuss job opportunities and academic advancements [17][80] - The community aims to cultivate future leaders in the industry by providing a platform for technical exchange and professional growth [12]
好用,高性价比!面向具身科研领域打造的轻量级机械臂
具身智能之心· 2025-09-26 02:24
面向具身科研领域打造的轻量级高性价比机械臂 还在为具身领域的硬件发愁吗?太贵的硬件买不起,太便宜的机械臂不好用,有没有一款价格低但质量很 高的产品? Imeta-y1来了!低成本可以完成具身领域论文的验证,科研场景的开发,满足大多数从业人员和科研工作者 的需求。 这是一款专为教育、科研与轻工业场景设计的轻量级机械臂。 该机械臂融合高精度运动控制、低功耗设计与开放软硬件架构,支持从仿真到真机的无缝联调,并提供全 流程开源SDK与工具链,助力用户快速实现算法验证、数据采集、模型训练与部署应用。 其紧凑型结构与模块化接口,尤其适用于嵌入式AI与机器人学习平台的开发与应用推广。 | 重量 | 631g | | --- | --- | | 尺寸 | 见附页 | | 行程 | 0~80mm | | 定位精度 | 40.5mm | | 外部接口 | 电源+CAN XT30 2+2 | | 重量 | 671g | | --- | --- | | 尺寸 | 见附页 | | 行程 | 0-80mm | | 定位精度 | 40.5mm | | 外部接口 | 电源+CAN XT30 2+2 | | 本体重量 | 4.2KG | 额定 ...
RoboDexVLM:基于VLM分层架构的通用灵巧机器人操作
具身智能之心· 2025-09-26 00:04
Core Insights - RoboDexVLM is an innovative robot task planning and grasp detection framework designed for collaborative robotic arms equipped with dexterous hands, focusing on complex long-sequence tasks and diverse object manipulation [2][6] Group 1: Framework Overview - The framework utilizes a robust task planner with a task-level recovery mechanism, leveraging visual language models to interpret and execute open vocabulary instructions for completing long-sequence tasks [2][6] - It introduces a language-guided dexterous grasp perception algorithm, specifically designed for zero-shot dexterous manipulation of diverse objects and instructions [2][6] - Comprehensive experimental results validate RoboDexVLM's effectiveness, adaptability, and robustness in handling long-sequence scenarios and executing dexterous grasping tasks [2][6] Group 2: Key Features - The framework allows robots to understand natural language commands, enabling seamless human-robot interaction [7] - It supports zero-shot grasping of various objects, showcasing the dexterous hand's capability to manipulate items of different shapes and sizes [7] - The visual language model acts as the "brain" for long-range task planning, ensuring that the robot does not lose track of its objectives [7] Group 3: Practical Applications - RoboDexVLM represents the first general-purpose dexterous robot operation framework that integrates visual language models, breaking through the limitations of traditional and end-to-end methods [6][7] - The framework's real-world performance demonstrates its potential in embodied intelligence and human-robot collaboration [6][7]
VLA这个方向的论文产出,是真的多......
具身智能之心· 2025-09-26 00:04
想象一下,如果能通过语言下达指令,并且丝滑执行任何你想要的动作,是一件多么幸福的事情!如果能长时 间连续动作完成,将会非常方便。下面给大家介绍下VLA到底是啥? VLA打破了传统方法的单任务局限,使得机器人能够在多样化的场景中自主决策,灵活应对未见过的环境, 广泛应用于制造业、物流和家庭服务等领域。此外,VLA模型已成为研究热点,推动了多个前沿项目的发 展,如pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA,这些研究促进了学术界与工业界的合作。其适应性 体现在能够应用于机械臂、四足机器人和人形机器人等多种平台,为各类智能机器人的发展提供了广泛的潜力 和实际应用价值,成为智能机器人领域的关键驱动力。 从今年各个机器人与AI顶会来看,VLA及其相关衍生方向,占据了近一半的具身产出。特别是长程操作、泛 化、少样本、VLA+RL、人形相关。 从产业角度看,国内外具身智能领域正处于蓬勃发展阶段,Unitree、智元、星海图、银河通用、逐际动力等团 队从实验室走向商业化,华为、京东、腾讯等科技巨头也积极布局,与国外Tesla、Figure AI等公司正在一起 推动这一领域的发展。 很多同学后台留言,咨 ...
RoboSeek破解了长时程任务的核心难题,当操作任务遇上 “具身认知”
具身智能之心· 2025-09-26 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 正是在这一技术困境下, RoboSeek 框架 的提出带来了突破性思路,其创新核心源于对 "具身认知理论" 的深度落地 —— 该理论颠覆了 "认知孤立于交互" 的传 统认知,强调智能体的认知能力源于与物体、环境的动态交互。基于这一理念,RoboSeek 构建了 "交互驱动感知与动作联合优化" 的全新架构:通过动态进化的 "注意力空间" 捕捉任务关键信息,用强化学习驱动的具身执行器实现精准动作控制,再借助交叉熵方法迭代优化关键目标,最后通过 "从现实到仿真再到现 实"(real2sim2real)的迁移流程,破解仿真与现实脱节的难题。 这一创新设计的价值不言而喻:在 8 项长时程任务、2 类不同机器人平台的测试中,RoboSeek 实现了 79% 的平均成功率,远超传统基线方法(成功率均低于 50%),不仅为长时程机器人操作提供了稳定可靠的解决方案,更填补了 "具身认知理论" 到 "机器人实际操作" 的落地空白,为通用机器人在真实环境中的应用 开辟了 ...
首个代码世界模型引爆AI圈,能让智能体学会「真推理」,Meta开源
具身智能之心· 2025-09-26 00:04
Core Viewpoint - The article discusses the introduction of the Code World Model (CWM) by Meta, which represents a significant evolution in AI models aimed at improving code generation through world modeling techniques [2][5][31]. Group 1: Model Overview - CWM is a 32 billion parameter open-weight large language model (LLM) designed to enhance code generation research based on world models [7][12]. - It features a dense, decoder-only architecture with a context length of up to 131k tokens, demonstrating strong performance in general programming and mathematical tasks [8][9]. Group 2: Performance Metrics - CWM achieved notable scores in various benchmarks: SWE-bench Verified (pass@1 65.8%), LiveCodeBench (68.6%), Math-500 (96.6%), and AIME 2024 (76.0%) [8][23]. - In comparison to other models, CWM's performance is competitive, particularly in the 30B parameter range [9][30]. Group 3: Training Methodology - The model was trained using extensive observational-action trajectories in a Python interpreter and agent-based Docker environment, focusing on improving code understanding beyond static code training [12][22]. - Meta has made available checkpoints from the mid-training, SFT, and reinforcement learning phases to support further research [13]. Group 4: Research Implications - CWM serves as a robust testing platform to explore the potential of world modeling in enhancing reasoning and planning capabilities in code generation [15][31]. - The research indicates that world models can benefit agent-based coding by allowing for stepwise simulation of Python code execution, which enhances reasoning from such simulations [16][31]. Group 5: Future Directions - Meta envisions that the code world model will bridge the gap between linguistic reasoning and executable semantics, with ongoing research needed to fully leverage its advantages across tasks [31]. - The model aims to improve reinforcement learning by enabling agents familiar with environmental dynamics to focus on learning actions that yield rewards [31].