具身智能之心
Search documents
星际硅途发布FoldPlanet-500数据集,开启智能叠衣机器人新纪元
具身智能之心· 2025-10-23 00:03
以下文章来源于星际硅途 ,作者星际硅途 星际硅途 . 上海星际硅途技术有限公司官方订阅号 想象一下 清晨的阳光洒进卧室,你刚把洗好的衣服从烘干机取出,一个灵巧的机器人手臂开始有条不紊地工作 —— 精准识别衣物类型, 流畅执行折叠动作, 短短几分钟,一堆衣物便化作一摞摞整齐的"小方块",完美 收纳进衣柜。 这一切并非遥不可及。 星际硅途预计,能够完成高复杂度衣物折叠的机器人,将在数年内实现落地应用。而实现这一愿景的 关键基础是 高质量、结构化、可学习 的 真实泛化场景叠衣动作数据集 。 今天 星际硅途 隆重推出 Fold Planet-500折叠星球 衣物折叠In-the-wild Human数据集 2 多模态数据,精准对齐 ① 视觉感知: 每个折叠任务都配备了多角度、高分辨率、时序清晰的视频、图像序列。多视角视频图 像数据,精确捕捉每一步操作的关键状态和动作轨迹。 ② 动作捕捉: 采用全身31节点动作捕捉采样技术,可精准捕获人体全身关节的运动轨迹与姿态变化, 生成适配机器人躯干及四肢协同控制的高精度动作数据,为机器人具身智能训练提供核心动作参考。 ③ 语义标注: 我们为每个折叠任务都配备了详尽化、步骤化、可量 ...
智元机器人亮相IROS 2025 :国际挑战赛圆满收官,全系产品实战演示圈粉
具身智能之心· 2025-10-22 12:00
Core Insights - The article highlights the successful hosting of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025) in Hangzhou, focusing on the integration of artificial intelligence and robotics technology [2] - Zhiyuan Robotics showcased its full range of products, demonstrating its leadership in embodied intelligence technology and ecosystem development [2][12] Product Demonstrations - Zhiyuan Robotics presented its product lineup, including the Qiling series, Lingxi X2, and Expedition A2, showcasing their capabilities in various industrial applications [4] - The Qiling G1 robot performed fully automated logistics operations without human intervention, collaborating with Demar Technology's intelligent sorting robot [4] - The newly launched Qiling G2 robot features advanced capabilities with dual 7-degree-of-freedom robotic arms, achieving high-precision tasks and expanding humanoid robot applications [4] - The Lingxi X2 demonstrated high maneuverability and interactive capabilities, while the Expedition A2 showcased its end-to-end operational capabilities in a simulated environment [6] International Challenge - The inaugural "AgiBot World Challenge @ IROS 2025," co-hosted by Zhiyuan Robotics and OpenDriveLab, attracted 431 teams from 23 countries, with a total prize pool of $560,000 [9] - The Manipulation track featured intense competition, with 11 teams advancing to the finals, showcasing their skills on the Zhiyuan Robotics platform [9] - The World Model track focused on AI's ability to predict the physical world, leading to several technological breakthroughs [11] Strategic Direction - Zhiyuan Robotics is driving the large-scale application of embodied intelligence technology through a dual strategy of "technology + ecosystem," aiming to empower various industries and promote human-robot collaboration [12]
宇树最新机器人发布:1米8大高个,能跳舞会功夫,就是颜值一言难尽
具身智能之心· 2025-10-22 06:02
Core Viewpoint - The article discusses the launch of Unitree's new humanoid robot, H2, highlighting its design, features, and public reception. Group 1: Product Features - The H2 robot stands 180 cm tall and weighs 70 kg, making it 23 kg heavier than its predecessor, H1 [2][14]. - H2 features a bionic face, which aims to make it appear more human-like [5][6]. - The robot has 31 degrees of freedom, enhancing its movement capabilities compared to previous models [14][24]. Group 2: Public Reception - The design of H2's face has received mixed reviews, with many users finding it unsettling, reminiscent of the NS-5 robot from the movie "I, Robot" [8][10]. - Despite the advanced features, some viewers commented on the robot's dance performance, describing it as lacking emotional expression and comparing it to a "zombie" [25][26]. - Overall, while there is excitement about the new robot, users are curious about its practical applications, such as household chores [38]. Group 3: Performance Demonstrations - H2 showcased its capabilities through dance, martial arts, and runway walking, demonstrating improved agility and coordination [20][28][36]. - The robot's martial arts performance was noted to be impressive, indicating advancements in stability and coordination technology [33]. - The final presentation included a reference to Leonardo da Vinci's "Vitruvian Man," symbolizing the ideal human proportions [40].
别造轮子了!原力灵机开源Dexbotic:迈向具身智能的一站式VLA工具箱
具身智能之心· 2025-10-22 06:02
Core Insights - The article discusses the rapid development of embodied VLA (Vision-Language Agents) models and the challenges faced by individual developers and small research teams in creating and maintaining a unified open-source framework for these models [4][7][29]. Group 1: VLA Development Challenges - The current VLA development landscape is fragmented, with various teams using different deep learning frameworks and model architectures, leading to inefficiencies in model comparison and performance evaluation [4][7]. - Existing VLA models often do not leverage the capabilities of the latest LLMs (Large Language Models), which limits the potential of the "embodied brain" [4][7]. - There is a pressing need for a mature, unified open-source VLA framework to address these challenges, which has led to the creation of Dexbotic [4][7]. Group 2: Dexbotic Framework Features - Dexbotic integrates mainstream pre-trained models for manipulation and navigation policies, supporting both cloud and local training, making it user-friendly and ready to use [2][4]. - The framework introduces the Dexdata format to unify data from different sources, significantly reducing storage costs and simplifying data preparation for developers [9][10]. - Dexbotic's architecture consists of three layers: data layer, model layer, and experimental layer, enhancing the efficiency of algorithm comparison and model iteration by over 50% [11][24]. Group 3: Performance Improvements - Dexbotic's pre-trained models have shown significant performance improvements in various tasks, with DB-CogACT achieving an 18.2% increase in average success rate compared to the original CogACT model [21][22]. - The framework has also demonstrated strong performance in real-world tasks, with UR5e achieving a 100% success rate in specific tasks [29]. Group 4: Open Source and Community Engagement - Dexbotic aims to facilitate collaboration and innovation in the field of embodied intelligence by providing an open-source platform that allows developers to contribute and share their work [30][32]. - The initiative encourages participation from both academic and industrial partners to enhance the development of embodied intelligence technologies [30][32].
RLINF-VLA:一种用于 VLA+RL 训练的统一高效框架
具身智能之心· 2025-10-22 06:02
Core Insights - The article presents the RLinf-VLA framework, a unified and efficient framework for training Visual-Language-Action (VLA) models using Reinforcement Learning (RL), addressing the limitations of existing models that rely on supervised fine-tuning [2][53] - The framework significantly enhances training efficiency and generalization capabilities, achieving high success rates in various simulation tasks and demonstrating superior performance in real-world applications compared to traditional supervised methods [5][53] Framework Design - The RLinf-VLA framework integrates multiple simulators, algorithms, and VLA architectures, optimizing resource allocation through flexible execution modes and system-level enhancements [4][53] - It supports three GPU allocation strategies: colocated, disaggregated, and hybrid, allowing users to easily switch modes via configuration files, thus reducing system customization costs [10][11] Model Compatibility - The framework supports LoRA for efficient parameter tuning, reducing memory consumption and accelerating training while maintaining performance [12] - It is compatible with OpenVLA and its extension OpenVLA-OFT, which have shown strong performance in various robotic operation benchmarks [12][22] Multi-Simulator Support - The framework emphasizes the importance of simulators in RL, utilizing ManiSkill and LIBERO as primary simulators to achieve diverse task capabilities [13] - It provides a unified interface for different simulators, facilitating the implementation of various tasks and supporting multiple RL algorithms, initially focusing on PPO and GRPO [13][14] Algorithm Design - The framework incorporates advanced techniques for advantage function and log-probability calculations, allowing for flexible integration of block-level and action-level definitions [14][15] - It supports various optimization strategies, including trajectory length normalization and effective action masking, to enhance training stability and performance [19][20] Experimental Results - The RLinf-VLA framework demonstrated significant performance improvements, with success rates increasing by 45% to 70% in various tasks compared to baseline models [22][24] - In LIBERO tasks, the framework achieved an average success rate of 98.11%, showcasing its capability for large-scale multi-task reinforcement learning [28] High Efficiency Performance - The framework's efficiency is evaluated based on throughput, achieving substantial improvements in training speed across different GPU configurations [30][35] - The hybrid allocation mode outperformed traditional methods, demonstrating the benefits of pipeline overlapping in resource utilization [35][37] Real-World Deployment - The RLinf-VLA framework was successfully deployed in real-world environments, showing superior zero-shot generalization capabilities compared to supervised fine-tuning strategies [51][53] - The experiments indicated that RL-trained models could adapt better to real-world tasks, achieving higher success rates in object manipulation tasks [51] Conclusion - The RLinf-VLA framework represents a significant advancement in the field of embodied intelligence, providing a robust foundation for future research and development in VLA training [53]
告别「偏科」,UniVid实现视频理解与生成一体化
具身智能之心· 2025-10-22 06:02
更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在视频生成与理解的赛道上,常常见到分头发力的模型:有的专注做视频生成,有的专注做视频理解(如问答、分类、检索等)。而最近, 一个开源项目 UniVid,提出了一个「融合」方向:把理解 + 生成融为一体 —— 他们希望用一个统一的模型,兼顾「看懂视频」+「生成视频」的能力。 编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 这就像把「看图识物」和「画图创作」两件事,交给同一个大脑去做:理解一段文字 + 理解已有视频内容 → 再「画」出新的、连贯的视频 —— 这在技术上挑战 极大。 UniVid 想解决什么问题? UniVid 尝试把视频「理解」与「生成」融合为一体,构建出一个 真正通用的统一视频模型(Unified Video Model), 一个既能「理解」又能「生成」的视频多模 态模型。 核心创新 1.统一结构:Adapter-based Unified Architecture 论文标题:UniVid: The Open-Sourc ...
Ask-to-Clarify:解决指令的模糊性,端到端为真实具身任务生成动作
具身智能之心· 2025-10-22 03:04
作者丨 Xingyao Lin等 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 写在前面&出发点 具身智能体的最终目标是成为能够与人类主动交互的协作者,而不仅仅是被动遵循指令的执行者。这要求智能体能够根据人类反馈调整自身行为。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 近年来,视觉-语言-动作模型(VLA)的发展为实现这一目标提供了一条有前景的路径。然而,目前大多数基于VLA的具身智能体以一种简单的单向模式运行:即接 收指令后便直接执行,没有任何与用户的交流。在指令通常具有模糊性的真实世界场景中,这种被动的方法往往会失效。针对这一问题,本文提出了Ask-to-Clarify 框架。该框架首先通过多轮对话提出问题以解决指令的模糊性,随后以端到端的方式为真实世界的具身任务生成动作。 本工作的贡献 任务与框架设计: 提出了一项新的具身智能体协作任务及相应的框架。该任务要求智能体在执行指令前,先通过提问的方式主动消除指令的模糊性,随后完成任 ...
具身智能之心机器人运动控制群成立啦~
具身智能之心· 2025-10-22 03:04
如果您想和博主交流产业和学术,也欢迎添加博主微信:oooops-life 具身智能之心机器人运动控制群成立了,欢迎人形&四足等相关研究方向的同学加入!我们关注 VLA、强化、WBC、MPC等内容。 添加小助理微信AIDriver005进群,备注:机构+姓名+运控加群; ...
从几个代表性的工作分析强化学习和VLA是怎么结合的?挑战有哪些?
具身智能之心· 2025-10-22 03:04
Core Insights - The article discusses the integration of reinforcement learning (RL) with Visual-Language-Action (VLA) models to enhance robotic capabilities, enabling robots to understand visual and linguistic instructions while optimizing their actions through trial and error [2][8]. Group 1: VLA and Reinforcement Learning Integration - The combination of VLA models and RL allows robots to interpret tasks and adjust their actions based on feedback, improving their performance in complex environments [2][3]. - The GRAPE framework enhances the generalization of robotic policies by aligning preferences, breaking down complex tasks into manageable stages, and optimizing actions through RL, resulting in a success rate increase of 51.79% for seen tasks and 58.20% for unseen tasks [6][7]. Group 2: Addressing Generalization Challenges - VLA models struggle with generalization in unfamiliar scenarios; however, the VLA-RL framework models the robotic operation as a multi-turn dialogue, achieving higher success rates in 40 complex tasks compared to pure imitation learning [8][10]. - The ReWiND framework generates flexible reward functions through language descriptions, allowing robots to adapt to new tasks with a learning efficiency that is twice as fast in simulations and five times faster in real-world applications [12][14]. Group 3: Fine-Tuning Strategies - The ConRFT framework combines offline and online fine-tuning methods, achieving an average success rate of 96.3% across eight real-world tasks, significantly improving performance compared to traditional supervised learning [15][18]. - The Dual-Actor framework utilizes a pre-trained VLA model to master basic actions before fine-tuning through RL, enhancing the robot's ability to perform complex assembly tasks with higher success rates [20][22]. Group 4: Safety and Efficiency - Safety mechanisms are integrated into RL processes to prevent collisions and damage during robotic exploration, ensuring a secure and efficient learning environment [23][24]. - The article emphasizes the importance of designing efficient multi-modal encoders to address the challenges of integrating visual, linguistic, and action data, which can lead to information loss [27][28].
IROS 2025 AIR4S 研讨会:AI + 机器人,正在重塑未来科学
具身智能之心· 2025-10-21 07:20
Core Insights - The article discusses the integration of embodied AI and robotics in scientific research, highlighting the shift from human-led exploration to AI and robot collaboration in scientific discovery [4][5]. Group 1: Workshop Overview - The IROS 2025 AIR4S Workshop will focus on "Embodied AI and Robotics for Future Scientific Discovery," scheduled for October 24, 2025, in Hangzhou, China [6][20]. - The workshop aims to explore how AI and robots can participate in the entire research process, from literature review to hypothesis generation, experimental execution, data analysis, and publication [5][6]. Group 2: Expert Participation - Notable experts from academia and industry will participate, including representatives from Unitree Robotics, MIT, Stanford University, Tencent Robotics X, and the University of Tokyo, discussing various aspects of embodied AI and its applications in scientific discovery [7][9]. Group 3: Research Papers and Innovations - The workshop has received 17 papers covering cutting-edge topics such as AI for Science, robotic scientists, and laboratory automation. A novel AI Review mechanism will be introduced to assist in the paper review process [13][14]. - The integration of AI in scientific evaluation aims to enhance the efficiency and intelligence of research assessments [14]. Group 4: Community and Support - The workshop is supported by various organizations, including NOKOV, Frontiers in Robotics and AI, and Lumina, promoting the cross-disciplinary development of embodied AI and research automation [15]. - The article encourages scholars, students, and industry partners interested in the intersection of AI, robotics, and scientific discovery to join the IROS 2025 AIR4S community for updates and discussions [17].