Workflow
具身智能之心
icon
Search documents
港科大最新!超越人类示范:基于扩散的强化学习为VLA训练生成 “高质量、低方差“ 数据
具身智能之心· 2025-10-23 04:00
Core Insights - The article discusses the limitations of traditional human demonstration data in training Visual-Language-Action (VLA) models and introduces a novel diffusion-based reinforcement learning (RL) approach to generate high-quality training data [2][5]. Group 1: VLA Model and Data Generation - VLA models integrate visual, language, and action information, but their performance is often constrained by the quality and scale of manually collected data [5]. - The proposed diffusion RL algorithm offers a semi-automated method for high-quality data collection suitable for VLA training, enhancing model performance [5]. Group 2: Methodology and Results - The study presents an improved diffusion strategy optimization algorithm that generates high-quality, low-variance trajectories for VLA training [2]. - Evaluation on the LIBERO benchmark, which includes 130 long-horizon tasks, shows that the generated trajectories are smoother and more consistent than human demonstration data and outperform standard Gaussian RL-generated trajectories [2]. - Training VLA models solely on data generated by diffusion RL achieves an average success rate of 81.9%, which is a 5.3 percentage point improvement over human data and a 12.6 percentage point improvement over Gaussian RL data [2]. Group 3: Key Highlights - The article emphasizes the potential of RL-driven robot trajectory generation and the adaptability of the general RL framework to any VLA architecture [6]. - It highlights the performance breakthroughs that exceed human demonstrations, showcasing the effectiveness of the proposed approach [6].
人形机器人被干到万元以下,还有的同学不知道怎么入门......
具身智能之心· 2025-10-23 04:00
先说说影响,面向消费级,也就是说无论是科研机构、个人都能买得起,批量复购压力也不大啦。 在销量提升的同时,更多研究者将会不断贡献力量,会有很多新的思路融入进来,这对社区的发展 将会是极大的推动。 但也有很多同学,对人形机器人的数据采集、运控、还有特别难的vla任务一知半解,甚至还没找到 门路。在目前这个阶段,确实处于劣势。想要赶上这波浪潮,要对最新的一些内容保持跟进与学 习。 人形机器人被干到万元以下,还有的同学不知道怎么入门...... 今天看到松延动力人形机器人Bumi的信息,不得不说非常可爱,价格更是迷人:9998元。全球首款 万元以内的高性能机器人,再次验证了那句话,价格不是壁垒,供应链和整套技术方案在不断压低 本体价格。 目前这款机器人的价格,甚至低过了某些高端手机的价格。 具身智能之心知识星球一直在关注人形机器人、强化、vla等相关内容的分享,近一年的搭建,社区 内已经完成了技术路线分享、直播、问答、求职、赛事等多个版块的分享。实现了产业、学术、求 职、问答交流等多个领域的闭环。 1)持续的直播分享 社区为大家准备了很多圆桌论坛、直播,从本体、数据到算法,各类各样,逐步为大家分享具身行 业究竟在发 ...
我们开始招募具身领域相关的产品经理了~
具身智能之心· 2025-10-23 04:00
Group 1 - The company is recruiting product managers in the field of embodied intelligence and robotics [1] - The company is open to collaboration in areas such as course development, corporate consulting, and training [1] - The company invites interested candidates to contact via WeChat for further communication regarding compensation and collaboration models [1]
直击IROS现场:宇树禾赛自变量杭州论剑,美团C位攒局
具身智能之心· 2025-10-23 00:03
Core Viewpoint - The article emphasizes the importance of "embodied intelligence" in transforming the retail industry, with Meituan leading the way in integrating technology into real-world scenarios to enhance service efficiency and quality [9][10][11]. Group 1: Meituan's Strategy and Innovations - Meituan's strategic shift from "retail" to "retail + technology" highlights the integration of technology as a means to empower retail scenarios [9][10]. - The company is pioneering in the use of drones and autonomous delivery vehicles, being the only one in China authorized by the Civil Aviation Administration to operate drones nationwide, even at night [16][21]. - Meituan's focus on "autonomy" aims to drive transformation in the retail sector, showcasing innovations like drone delivery of food and rapid delivery services in various environments [14][15][18]. Group 2: Insights from Industry Experts - Various industry leaders at the conference discussed the need for embodied intelligence to address real-world challenges, emphasizing that technology should serve practical purposes rather than being an end in itself [5][6][12]. - The concept of "Generative Adversarial Transduction" (GAT) was introduced, which allows machine learning models to iteratively correct each other, enhancing both learning and stability [25][26]. - The discussion also covered the importance of infrastructure in supporting the robotics industry, with a focus on quality, performance, and cost management in hardware development [38][42][46]. Group 3: Theoretical Frameworks and Future Directions - Theoretical frameworks such as "Non-vector Space Control" and "Perceptive Control" were proposed, suggesting that robots should learn to act based on sensory inputs rather than relying solely on pre-defined paths [29][33]. - The need for a foundational model for embodied intelligence was emphasized, distinguishing it from existing AI applications and highlighting the importance of understanding the physical world [50][51][52]. - The article concludes with a vision for the future of robotics, where machines possess curiosity and the ability to adapt, ultimately leading to a harmonious coexistence with humans [106][108][110].
正式开课啦!具身智能目标导航算法与实战教程来了~
具身智能之心· 2025-10-23 00:03
Core Insights - Goal-Oriented Navigation empowers robots to autonomously complete navigation tasks based on goal descriptions, marking a significant shift from traditional visual language navigation [2] - The technology has been successfully implemented in various verticals, enhancing service efficiency in delivery, healthcare, and hospitality sectors [4] - The evolution of goal-oriented navigation can be categorized into three generations, each showcasing advancements in methodologies and technologies [6][8][10] Group 1: Technology Overview - Goal-Oriented Navigation is a key aspect of embodied navigation, relying on language understanding, environmental perception, and path planning [2] - The transition from explicit instructions to autonomous decision-making involves semantic parsing, environmental modeling, and dynamic decision-making [2] - The technology has been integrated into delivery robots, service robots in healthcare and hospitality, and humanoid robots for various applications [4] Group 2: Technical Evolution - The first generation focuses on end-to-end methods using reinforcement and imitation learning, achieving breakthroughs in Point Navigation and image navigation tasks [6] - The second generation employs modular approaches, constructing semantic maps and decomposing tasks into exploration and goal localization [8] - The third generation integrates large language models (LLMs) and visual language models (VLMs) to enhance exploration strategies and improve open-vocabulary target matching [10] Group 3: Challenges and Learning Opportunities - The complexity of embodied navigation requires knowledge across multiple domains, making it challenging for newcomers to enter the field [11] - A new course has been developed to address these challenges, providing a structured learning path and practical applications [11][12] - The course aims to build a comprehensive understanding of goal-oriented navigation, covering theoretical foundations and practical implementations [12][13]
刚刚,ICCV最佳论文出炉,朱俊彦团队用砖块积木摘得桂冠
具身智能之心· 2025-10-23 00:03
Core Insights - The article discusses the recent International Conference on Computer Vision (ICCV) held in Hawaii, highlighting the award-winning research papers and their contributions to the field of computer vision [2][5][24]. Group 1: Award Winners - The Best Paper Award was given to a research team from Carnegie Mellon University (CMU) for their paper titled "Generating Physically Stable and Buildable Brick Structures from Text," led by notable AI scholar Zhu Junyan [3][7][11]. - The Best Student Paper Award was awarded to a paper from the Technion, titled "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models," which introduces a novel image editing method [28][30]. Group 2: Conference Statistics - ICCV is one of the top three conferences in computer vision, held biennially, with this year's conference receiving 11,239 valid submissions and accepting 2,699 papers, resulting in a 24% acceptance rate, a significant increase from the previous conference [5]. Group 3: Research Contributions - The paper by CMU presents Brick GPT, the first method capable of generating physically stable and interconnected brick assembly models based on text prompts. The research includes a large dataset of over 47,000 brick structures and 28,000 unique 3D objects with detailed descriptions [11][13]. - The FlowEdit paper from Technion proposes a new image editing approach that bypasses the traditional image-to-noise inversion process, achieving higher fidelity edits by establishing a direct mapping path between source and target image distributions [32][34]. Group 4: Methodology and Results - The Brick GPT method utilizes a self-regressive large language model trained on a dataset of brick structures, incorporating validity checks and a physics-aware rollback mechanism to ensure stability in generated designs [13][19]. - Experimental results show that Brick GPT outperforms baseline models in terms of validity and stability, achieving a 100% validity rate and 98.8% stability in generated structures [20][22].
星际硅途发布FoldPlanet-500数据集,开启智能叠衣机器人新纪元
具身智能之心· 2025-10-23 00:03
以下文章来源于星际硅途 ,作者星际硅途 星际硅途 . 上海星际硅途技术有限公司官方订阅号 想象一下 清晨的阳光洒进卧室,你刚把洗好的衣服从烘干机取出,一个灵巧的机器人手臂开始有条不紊地工作 —— 精准识别衣物类型, 流畅执行折叠动作, 短短几分钟,一堆衣物便化作一摞摞整齐的"小方块",完美 收纳进衣柜。 这一切并非遥不可及。 星际硅途预计,能够完成高复杂度衣物折叠的机器人,将在数年内实现落地应用。而实现这一愿景的 关键基础是 高质量、结构化、可学习 的 真实泛化场景叠衣动作数据集 。 今天 星际硅途 隆重推出 Fold Planet-500折叠星球 衣物折叠In-the-wild Human数据集 2 多模态数据,精准对齐 ① 视觉感知: 每个折叠任务都配备了多角度、高分辨率、时序清晰的视频、图像序列。多视角视频图 像数据,精确捕捉每一步操作的关键状态和动作轨迹。 ② 动作捕捉: 采用全身31节点动作捕捉采样技术,可精准捕获人体全身关节的运动轨迹与姿态变化, 生成适配机器人躯干及四肢协同控制的高精度动作数据,为机器人具身智能训练提供核心动作参考。 ③ 语义标注: 我们为每个折叠任务都配备了详尽化、步骤化、可量 ...
智元机器人亮相IROS 2025 :国际挑战赛圆满收官,全系产品实战演示圈粉
具身智能之心· 2025-10-22 12:00
Core Insights - The article highlights the successful hosting of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025) in Hangzhou, focusing on the integration of artificial intelligence and robotics technology [2] - Zhiyuan Robotics showcased its full range of products, demonstrating its leadership in embodied intelligence technology and ecosystem development [2][12] Product Demonstrations - Zhiyuan Robotics presented its product lineup, including the Qiling series, Lingxi X2, and Expedition A2, showcasing their capabilities in various industrial applications [4] - The Qiling G1 robot performed fully automated logistics operations without human intervention, collaborating with Demar Technology's intelligent sorting robot [4] - The newly launched Qiling G2 robot features advanced capabilities with dual 7-degree-of-freedom robotic arms, achieving high-precision tasks and expanding humanoid robot applications [4] - The Lingxi X2 demonstrated high maneuverability and interactive capabilities, while the Expedition A2 showcased its end-to-end operational capabilities in a simulated environment [6] International Challenge - The inaugural "AgiBot World Challenge @ IROS 2025," co-hosted by Zhiyuan Robotics and OpenDriveLab, attracted 431 teams from 23 countries, with a total prize pool of $560,000 [9] - The Manipulation track featured intense competition, with 11 teams advancing to the finals, showcasing their skills on the Zhiyuan Robotics platform [9] - The World Model track focused on AI's ability to predict the physical world, leading to several technological breakthroughs [11] Strategic Direction - Zhiyuan Robotics is driving the large-scale application of embodied intelligence technology through a dual strategy of "technology + ecosystem," aiming to empower various industries and promote human-robot collaboration [12]
宇树最新机器人发布:1米8大高个,能跳舞会功夫,就是颜值一言难尽
具身智能之心· 2025-10-22 06:02
Core Viewpoint - The article discusses the launch of Unitree's new humanoid robot, H2, highlighting its design, features, and public reception. Group 1: Product Features - The H2 robot stands 180 cm tall and weighs 70 kg, making it 23 kg heavier than its predecessor, H1 [2][14]. - H2 features a bionic face, which aims to make it appear more human-like [5][6]. - The robot has 31 degrees of freedom, enhancing its movement capabilities compared to previous models [14][24]. Group 2: Public Reception - The design of H2's face has received mixed reviews, with many users finding it unsettling, reminiscent of the NS-5 robot from the movie "I, Robot" [8][10]. - Despite the advanced features, some viewers commented on the robot's dance performance, describing it as lacking emotional expression and comparing it to a "zombie" [25][26]. - Overall, while there is excitement about the new robot, users are curious about its practical applications, such as household chores [38]. Group 3: Performance Demonstrations - H2 showcased its capabilities through dance, martial arts, and runway walking, demonstrating improved agility and coordination [20][28][36]. - The robot's martial arts performance was noted to be impressive, indicating advancements in stability and coordination technology [33]. - The final presentation included a reference to Leonardo da Vinci's "Vitruvian Man," symbolizing the ideal human proportions [40].
别造轮子了!原力灵机开源Dexbotic:迈向具身智能的一站式VLA工具箱
具身智能之心· 2025-10-22 06:02
Core Insights - The article discusses the rapid development of embodied VLA (Vision-Language Agents) models and the challenges faced by individual developers and small research teams in creating and maintaining a unified open-source framework for these models [4][7][29]. Group 1: VLA Development Challenges - The current VLA development landscape is fragmented, with various teams using different deep learning frameworks and model architectures, leading to inefficiencies in model comparison and performance evaluation [4][7]. - Existing VLA models often do not leverage the capabilities of the latest LLMs (Large Language Models), which limits the potential of the "embodied brain" [4][7]. - There is a pressing need for a mature, unified open-source VLA framework to address these challenges, which has led to the creation of Dexbotic [4][7]. Group 2: Dexbotic Framework Features - Dexbotic integrates mainstream pre-trained models for manipulation and navigation policies, supporting both cloud and local training, making it user-friendly and ready to use [2][4]. - The framework introduces the Dexdata format to unify data from different sources, significantly reducing storage costs and simplifying data preparation for developers [9][10]. - Dexbotic's architecture consists of three layers: data layer, model layer, and experimental layer, enhancing the efficiency of algorithm comparison and model iteration by over 50% [11][24]. Group 3: Performance Improvements - Dexbotic's pre-trained models have shown significant performance improvements in various tasks, with DB-CogACT achieving an 18.2% increase in average success rate compared to the original CogACT model [21][22]. - The framework has also demonstrated strong performance in real-world tasks, with UR5e achieving a 100% success rate in specific tasks [29]. Group 4: Open Source and Community Engagement - Dexbotic aims to facilitate collaboration and innovation in the field of embodied intelligence by providing an open-source platform that allows developers to contribute and share their work [30][32]. - The initiative encourages participation from both academic and industrial partners to enhance the development of embodied intelligence technologies [30][32].