Workflow
具身智能之心
icon
Search documents
视觉VLA看不到的“那堵墙”,被发现了......
具身智能之心· 2026-01-27 07:24
点击下方 卡片 ,关注" 具身智能 之心 "公众号 想象一下:在一个阳光充足的下午,机器人在打扫房间。在窗户边上的桌子,透明的玻璃水杯需要放回指定的 位置,机器人走过去,面对强光直射和透明的物体,机器人只能无意义的重复抓取动作,仿佛面对一个"幽灵"。 这不是什么科幻场景,而是当下具身领域的现实困境 —— 在透明、反光、极端光照等日常场景中,3D 空 间感知失效,让具身机器人不再智能...... 一、纯视觉方案的"有心无力" 具身领域正在逐渐脱离"讲故事"的阶段,转变为生产力是每家企业都在思考的问题。 但在真实物理世界中,纯视觉依赖RGB图像的纹理、色彩信息推断空间关系,现实中大量场景让这种"空间 感知"寸步难行。 1. 透明物体:纯视觉VLA的"幽灵" 透明材质(玻璃、亚克力、透明容器)是纯视觉感知的噩梦。在机器人抓取任务中,纯视觉甚至无法定位 透明存储盒的存在,更别提精准抓取。虽然有一些方法尝试在解决这个问题,但效果还比较受限,主要是 因为: 透明物体 无自身固定纹理 ,表面信息完全依赖环境反射与折射; 2. 反光与极端光照:无纹理场景的"感知失明" 同样,反光表面(金属器皿、镜子、光滑车漆)和极端光照(强 ...
分层 RL-MPC 框架:让机器人 “懂几何、善接触” 的灵巧操作新范式
具身智能之心· 2026-01-27 03:00
Core Insights - The article discusses the challenges faced by robots in dexterous manipulation, highlighting issues such as high data requirements, difficulties in virtual-to-real transfer, and weak generalization capabilities [2] - A new hierarchical RL-MPC framework inspired by human operation logic is proposed, achieving nearly 100% task success rate, 10 times data efficiency improvement, and zero-shot virtual-to-real transfer [2][4] Challenges in Traditional Dexterous Manipulation - Traditional approaches struggle to balance learning efficiency, robustness, and generalization, with three main issues identified: 1. End-to-end vision methods require massive data for learning non-smooth contact dynamics, leading to low efficiency in long-term tasks [3] 2. Motion strategies face significant gaps in performance across different object geometries and scenes [3] 3. Traditional model control lacks flexibility and adaptability in open environments with diverse object shapes [3] Innovations in the Hierarchical RL-MPC Framework - The framework's core innovation is the "Contact Intention," which serves as an interface connecting high-level decision-making and low-level execution, structured into three layers and two modules [4][6] - High-level RL focuses on predicting contact intentions based on scene observations, while low-level MPC specializes in executing contact dynamics [4][12] High-Level RL Strategy - The high-level RL strategy employs a three-component observation space that includes geometry, target, and collision information, enhancing the strategy's environmental awareness [7] - The framework uses indirect prediction of MPC weights to define sub-goals, improving learning efficiency by allowing flexible switching between sub-goals [8] - A dual-branch network architecture balances local details and global context, optimizing feature extraction for both [9] Low-Level MPC Execution - The low-level MPC utilizes a complementary free model predictive control (ComFree-MPC) to ensure stability and adaptability in contact actions, operating at a high frequency of 100Hz [12][16] - The optimization objectives are designed to strictly adhere to high-level intentions, ensuring quick responses to disturbances [17] Experimental Validation - The framework demonstrated strong performance in two non-prehensile tasks, achieving a success rate of 97.34% for unseen objects in a pushing task and 100% in 3D redirection tasks [20][24] - The data efficiency of the framework significantly outperformed end-to-end strategies, requiring only 15,000 RL decision steps to achieve 100% success rate compared to 600,000 steps for traditional methods [26] Robustness and Virtual-to-Real Transfer - The framework exhibited high robustness against various disturbances, maintaining performance while traditional methods failed under similar conditions [25][29] - The strategy was successfully deployed on real robots without any fine-tuning, achieving high success rates across various objects [30] Limitations and Future Directions - The framework currently relies on accurate pose estimation, which can lead to failures in real-world scenarios, indicating a need for integrated perception-planning-control designs [36] - There are challenges in scalability with multiple end-effectors, suggesting future work should focus on optimizing contact intention representation [36] Conclusion - The hierarchical RL-MPC framework represents a significant advancement in dexterous manipulation, effectively combining decision-making flexibility with execution stability, paving the way for broader applications in robotics [37]
AAAI 2026杰出论文奖 | ReconVLA:具身智能领域首次获得
具身智能之心· 2026-01-27 03:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨机器之心 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 在长期以来的 AI 研究版图中,具身智能虽然在机器人操作、自动化系统与现实应用中至关重要,却常被视为「系统工程驱动」的研究方向,鲜少被认为能够在 AI 核心建模范式上产生决定性影响。 而 ReconVLA 获得 AAAI Outstanding Paper Awards,释放了一个清晰而重要的信号: 让智能体在真实世界中「看、想、做」的能力,已经成为人工智能研究的核 心问题之一 。 1月30日(周五)晚19:30,我们很荣幸能邀请到AAAI 2026最佳论文ReconVLA的第一作者宋文轩,做客"具身智能之心"直播间。 论文标题:ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver 论文地址:https://arxiv.org/abs/2508.10333 论文代码:https://github.com/Chowzy069/Reconvla 本次直播将聚焦一个核心议题:抛开参 ...
国内首篇!融合语言模型的多模态触觉传感器
具身智能之心· 2026-01-26 03:42
编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 论文第一作者为清华大学博士、南洋理工大学博士后李寿杰,清华大学博士生吴同和人工智能硕士生徐建乐。论文通讯作者包括清华大学深圳国际研究生院 副教授丁文伯,大连理工大学教授解兆谦,新加坡国立大学助理教授吴昌盛和香港城市大学教授于欣格。 随着机器人技术从「预设程序执行」向「具身智能交互」跨越,触觉感知作为理解物体属性、实现精细操作的核心感测方式,其重要性日益凸显,但当前系 统在感知维度、分辨率及信号解读能力上仍远逊于人类,导致机器人往往处于「有感无知」的状态。 在此背景下, 清华大学深圳国际研究生院丁文伯团队 联合无界智航(Xspark AI)及多所国内外科研机构,从鸽子卓越的多光谱视觉和非成像感知机制中获得灵 感,研发出了一种仿生多模态触觉传感器 SuperTac 。 该系统将多光谱成像、摩擦电感测与惯性测量融为一体,并通过构建 8.5B 参数的触觉语言模型 DOVE ,实现了触觉信号从底层感知到 ...
对话智元机器人首席科学家罗剑岚|未来机器人在真实世界大规模部署将会面临哪些挑战?
具身智能之心· 2026-01-26 03:42
点击下方 卡片 ,关注" 具身智能 之心 "公众号 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 当越来越多的机器人从实验室的受控环境走向工厂、家庭等开放复杂的真实世界,它们还能像在实验室里面一样"灵巧能干",持续稳定执行各种"活 儿"吗? 具身智能独角兽Generalist AI在2025年11月推出Gen-0 ,引起业界震动。其充分利用了数据工厂,采集了 270000 小时的数据。根据官方发布的内容,其 目前每周可以以10000 小时的速度继续采集。此次发布给了行业一个宝贵的insight:通过大规模真实机器人数据和持续训练,可以推动具身模型向更通 用的方向演化。但同时也暴露了行业的一个"瓶颈":对于数据的需求远超我们的想象。这便给真实世界的部署提出了更大的难题: 受困于更高的任务专 精度要求,以及离线数据采集方式的边际效益递减。 那么,是否有这样一套策略,可以支持机器人群"随到随学",随时调整?即真实部署环境中,把数据回流、模型后训练和策略更新,组织成一个长期可 运行的工程系统,让更多的机器人在真实环境中不会因为" 没见过,没练过 "而宕 ...
别再想靠“demo”糊弄,NVIDIA联合光轮智能正式开启具身评测驱动的时代!
具身智能之心· 2026-01-26 01:04
Core Insights - The rapid development of models like VLA has led to the emergence of various testing benchmarks, but the growth in model capabilities has outpaced existing benchmarks, highlighting a significant issue in the embodied intelligence field: the lack of a standardized measurement system for assessing true model capabilities [2] - The reliance on experience and intuition for R&D decisions has become a systemic risk in the transition from research to engineering in embodied intelligence [2] Group 1: Challenges in the Embodied Intelligence Field - The field is transitioning from storytelling to productivity, showcasing advancements like medical robots and mobile operation robots, but there are underlying industry consensus issues regarding the limitations of models and their ability to generalize across different tasks and environments [3][4] - The need for comprehensive generalization capabilities is emphasized, as robots must perform well in varied scenarios without being overly specialized, which is currently a challenge for many companies in the industry [5][6] Group 2: Testing and Evaluation Issues - The current testing landscape lacks standardized, scalable evaluation methods, leading to a reliance on limited testing scenarios that do not adequately measure model capabilities [10][12] - The industry consensus is that real-world testing cannot be scaled effectively, making simulation the only viable path for evaluation [13][21] Group 3: The Need for Industrial-Grade Evaluation Systems - There is a pressing need for a unified, scalable, and deterministic evaluation infrastructure that can support industrial-level decision-making in embodied intelligence [21][22] - NVIDIA and Lightwheel Intelligence's collaboration to create the Isaac Lab-Arena represents a significant step towards establishing a scalable evaluation framework in the field [23][24] Group 4: Features of the Isaac Lab-Arena - The Arena allows for flexible task creation and evaluation, moving away from rigid scripts to a modular approach that can adapt to various tasks and environments [26][28] - It supports a diverse range of tasks and environments, enabling systematic measurement of model capabilities rather than isolated demonstrations [66][70] Group 5: RoboFinals as an Industrial Benchmark - Lightwheel Intelligence has developed RoboFinals, an industrial-grade evaluation platform with over 250 tasks that systematically expose model failure modes and capability boundaries [63][71] - RoboFinals has been integrated into the workflows of leading model teams, providing continuous evaluation signals rather than just a ranking system [71][73] Group 6: The Importance of Collaboration - The partnership between NVIDIA and Lightwheel Intelligence is notable for its depth, as it combines strengths in simulation technology and real-world application experience to create a comprehensive evaluation system [42][56] - The collaboration aims to ensure that the evaluation infrastructure is not only technically sound but also aligned with the practical needs of model teams and robotic companies [54][56]
快来围观机器人上班!RoCo Challenge @ AAAI 2026 线下赛直播开启!
具身智能之心· 2026-01-25 04:26
这些设置综合评估了机器人在真实生产环境中所需的关键能力,包括自适应协作、状态理解以及具备错误意识的自主性,是实现高鲁棒性机器人系统的重要基础。 经过线上赛道的激烈角逐,6 支队伍脱颖而出,成功晋级实体赛阶段: 从零开始装配(Assembly from Scratch):从空的工作空间出发,完整完成装配流程; 部分状态续接(Resume from Partial State):在装配已部分完成的情况下,正确理解当前状态并继续装配; 错误检测与恢复(Error Detection and Recovery):识别并修复类人错误后,再继续装配任务。 RoCo Challenge @ AAAI 2026(面向以人为中心制造的机器人协同装配,Robotic Collaborative Assembling for Human-Centered Manufacturing)是一项主要由南洋理工大学(NTU) 和 新加坡科技研究局(A*STAR)主办的科研挑战赛。该挑战聚焦在工业制造领域的机器人装配任务,要求机器人不仅能够高质量地完成零部件操作,还需要理解人类已完成的 装配进度,并能够从真实且常见的人为错误中进行恢复。 R ...
人形机器人成本相差近3倍,国内的供应链正在吊打海外
具身智能之心· 2026-01-25 03:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 ★ 成本差近 3 倍,中国供应链拉低全球商业化门槛 从物料成本(BoM)的对比来看,中国供应链的成本优势已经十分显著: 2025年当前阶段 :依托中国供应链,单台人形机器人的物料成本约为4.6万美元;若完全采用非中国供应 链,成本将飙升至13.1万美元,差价接近3倍。 未来规模效应预测 :摩根士丹利预计,到2034年全球年销量突破百万台时,中国供应链的成本还将进一 步降至1.6万美元,性价比优势将持续扩大。 从成本构成的拆解来看,核心部件的差价尤为突出: 国内也涌现出一批具备技术实力的企业,包括优必选、宇树科技、银河通用机器人、小鹏机器人、乐聚机器 人等,在技术落地与供应链整合层面均走在全球前列。 | 部件类型 | 中国供应链成本(2025年) | 非中国供应链成本(2025年) | | --- | --- | --- | | 执行器(Actuator) | 2.2万美元 | 5.8万美元 | | 部件 ...
VLA任务的成本已经越来越低了~
具身智能之心· 2026-01-24 01:05
Core Viewpoint - The cost of robotic arms has significantly decreased, with prices now below 5000 yuan, making them more accessible for various VLA tasks [1][2]. Group 1: Cost Trends - Two years ago, the price for a single robotic arm for VLA tasks was over 30,000 yuan, which has now dropped to around 15,000 yuan last year, and currently below 5000 yuan [2]. - This price reduction allows for easier implementation of various VLA tasks such as pi0 and pi0.5 [2]. Group 2: Challenges for Beginners - Many beginners face difficulties in replicating VLA tasks due to high costs and lack of effective data collection methods [3][4]. - A significant amount of time is wasted by beginners on troubleshooting and overcoming obstacles in data collection and model training [4]. Group 3: Educational Initiatives - The company has developed a comprehensive course aimed at addressing the challenges faced by beginners in the VLA field, covering hardware, data collection, algorithms, and practical experiments [9][14]. - The course includes a free SO-100 robotic arm for participants, enhancing hands-on learning [19]. Group 4: Target Audience and Requirements - The course is designed for individuals seeking practical experience in VLA, including students and professionals transitioning from traditional fields [26]. - Participants are expected to have a foundational knowledge of Python and Pytorch, as well as experience with real machines and data collection [26].
Sunday的ACT-1分享!未使用任何机器人本体数据训练的VLA,解决超长时程任务
具身智能之心· 2026-01-24 01:05
Core Viewpoint - The article discusses the advancements in embodied intelligence, particularly focusing on the company Sunday and its developments in robotic technology, emphasizing the importance of data collection and the innovative approaches to overcome existing limitations in the robotics field [1][6][29]. Group 1: Technological Advancements - Sunday has made significant progress in demonstrating ultra-long-range home tasks with its ACT-1 robot, showcasing capabilities in mobile manipulation without relying on remote operation data [5][20]. - The company has developed a "Skill Capture Glove" that aligns the geometric structure and sensor layout of human hands with robotic hands, allowing for effective data transfer and training [11][12]. - The ACT-1 model can perform complex tasks such as folding socks and operating a home espresso machine, highlighting advancements in dexterity and manipulation [26][27]. Group 2: Data Collection and Challenges - The robotics industry faces a critical data bottleneck, lacking a comprehensive real-world operational data corpus comparable to that of large language models [6][7]. - Sunday aims to bridge the "embodiment mismatch" by ensuring that robots can learn from human data, leveraging the vast amount of daily activity data from the global population [7][12]. - The company has accumulated approximately 10 million examples in its data library by the end of 2025, with 2,000 data collection units actively gathering data [8]. Group 3: Innovative Solutions - Sunday has developed a "Skill Transform" system that aligns raw observational data, effectively eliminating human-specific features and generating high-fidelity training sets for robots [12]. - The company emphasizes a full-stack approach to data collection, processing, and model training, significantly enhancing efficiency in data utilization [29]. - The design of the Memo robot incorporates compliant control and passive stability, ensuring safety and adaptability in various environments [32][33].