Workflow
具身智能之心
icon
Search documents
移动操作&双臂操作开源硬件与方案
具身智能之心· 2025-10-20 00:03
Core Viewpoint - The article emphasizes the importance of open-source projects in advancing mobile and dual-arm robotic operations, highlighting their role in breaking down technical barriers and accelerating innovation in various applications, from household robots to industrial automation [3]. Group 1: Open-Source Projects Overview - XLeRobot, developed by Nanyang Technological University, focuses on flexible movement and precise operation in complex environments, providing a reference framework for mobile and dual-arm control [4]. - AhaRobot from Tianjin University emphasizes autonomy and environmental adaptability in dual-arm operations, integrating perception, planning, and control modules for service robots [6]. - ManiGaussian++, released by Tsinghua University, optimizes dual-arm operation accuracy using Gaussian models, particularly in 3D environment perception and motion planning [8]. - H-RDT, a collaboration between Tsinghua University and Horizon Robotics, aims at efficient decision-making and real-time operations for mobile robots in various settings [11]. - RoboTwin 2.0, developed by Shanghai Jiao Tong University and the University of Hong Kong, integrates simulation and physical platforms for mobile and dual-arm operations [14]. - Open X-Embodiment, from Arizona State University, focuses on a generalized learning framework for robotic operations, supporting cross-scenario skill transfer [16]. - 3D FlowMatch Actor, a joint project by Carnegie Mellon University and NVIDIA, enhances dynamic adaptability in 3D space for mobile and dual-arm operations [19]. - OmniH2O, developed by Carnegie Mellon University, focuses on human-robot action mapping and humanoid operation, facilitating remote control and action teaching [24]. - TidyBot++, a collaboration between Princeton University and Stanford University, targets household organization tasks, integrating object recognition and dual-arm collaboration algorithms [27]. - robosuite, from the University of California, Berkeley, is a mature simulation platform for robotic operations, providing standardized tasks and evaluation tools [29]. - SO-ARM100, a standardized dual-arm operation hardware and software solution, aims to lower development barriers for educational and research purposes [32]. - GOAT, developed by UIUC and CMU, focuses on goal-directed movement and operation for robots, emphasizing robustness and versatility [34]. - Mobile ALOHA, from Stanford University, combines mobile chassis and dual-arm operations for low-cost, easily deployable service robots [35].
只需少量演示即可灵活应对多样物体!阿米奥冯骞团队携低成本精准灵巧操作方案亮相IROS!
具身智能之心· 2025-10-20 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 先来看一段视频。 ★ 该项成果的一作为阿米奥联合创始人兼技术负责人冯骞,硕博均就读于德国慕尼黑工业大学,师从机 器人泰斗Alois Knoll,曾是思灵机器人早期员工、研究科学家。本次IROS2025,冯博将会在 Deep Learning in Grasping and Manipulation论坛上针对这项工作发表演讲。 机器人灵巧操作领域研究进展 领域痛点 机器人灵巧操作(如多手指抓取)是实现 "类人机器人" 的关键,但现有方案存在三大核心问题: 当机器人面对陌生物体, 如何靠少量演示、单视角观测就精准抓取? LensDFF ,这项由阿米奥机器人给出 了颠覆性方案—— 它跳出传统 "依赖多视角数据、额外训练对齐网络" 的思路,直接用语言特征作为 "语义 锚点",将 CLIP 提取的 2D 视觉特征,通过动态投影公式对齐到 3D 空间,从根源解决跨视角特征不一致 问题,且全程无需微调。 更关键的是,它把 5 种抓取原语(捏 / 钩 / 三脚架等)融入少样本演示,搭配 "法向量引导初始化 + 低维 eigengrasp 优化",让 DLR-HIT 灵巧手能 ...
端到端基础模型!VCoT-Grasp: 视觉思维链增强的机器人抓取检测大模型
具身智能之心· 2025-10-19 13:50
>> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 思维链 (Chain-of-Thought, CoT) 是一种通过中间思考步骤增强大语言模型推理能力的方法。视觉思维链 (Visual Chain-of-Thought, VCoT) 将思维链从文 本模态扩展到图像模态,以图像作为中间思考步骤,被用来提升多模态大模型的思考能力。 (a)基于多模态融合的方法, (b)使用LLM/VLM提供指引的模块化方法, (c)带有语言推理能力的端到端多模态大模型方法, (d)我们的方法,引入视觉推理能 力,以目标的bounding box图像作为思考步骤。 VCoT-Grasp构建了一个端到端的基础模型,并引入视觉思维链来增强视觉理解能力。实际运行中,模型以目标物品的bounding box图像作为中间思考步 骤,首先预测目标的bounding box作为粗粒度位置,之后目标区域的图像被裁剪并输入模型用以提供细粒 ...
ROSCon China 2025 大会议程公布!
具身智能之心· 2025-10-18 16:03
以下文章来源于古月居 ,作者古月居 古月居 . 专业的ROS机器人知识社区和产业服务平台 编辑丨 古月居 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 2025年机器人领域的"技术盛宴"定档! ROSCon China 2025 将于 10月31日—11月1日 ,在 上海虹桥新华联索菲特大酒店 正式启幕,现在,万众期待的 「完整议程」 终于 重磅公布 ! 无论是想紧跟ROS技术前沿,还是要解决工程落地难题,这场大会都能满足你——两天时间里,核心开发者、产业领袖、资深工程团队将齐聚现场,带来从"技术深 度"到"落地实效"的全维度内容。 谁该来?这几类人一定要占座 • 机器人开发者: 获取ROS最新技术动态,解决开发中的"卡脖子"问题; • 企业技术负责人: 对接产业落地案例,找到适合自身业务的技术方案; • 高校科研人员: 链接行业资源,让研究成果更快对接实际应用; • 机器人爱好者: 近距离接触前沿技术,打开行业认知新视野。 现在议程已公开,席位有限 ...
港科广&清华联合提出Spatial Forcing:隐式空间对齐,超越主流2D/3D VLA模型性能
具身智能之心· 2025-10-18 16:03
Core Insights - The article discusses the limitations of current Vision-Language-Action (VLA) models that primarily rely on 2D visual data, lacking a deep understanding of real 3D space, which hampers their ability to perform tasks in the physical world [2][4] - The proposed method, Spatial Forcing (SF), allows VLA models to develop spatial understanding without explicit 3D input by aligning visual features with a powerful 3D geometric representation generated by an external model [2][10] Methodology - The SF method employs an implicit spatial alignment strategy, enabling the model to autonomously acquire spatial understanding during training without the need for additional 3D sensors [2][13] - A depth probing experiment was conducted to verify the presence of 3D information in the original VLA's visual features, revealing that without 3D input, the model cannot form accurate spatial perceptions [11][13] - The training process involves aligning the VLA model's visual tokens with pixel-level spatial representations extracted from a pre-trained 3D model, optimizing both spatial alignment loss and action generation loss [16] Performance Results - The SF method significantly outperforms existing 2D and 3D VLA models in various tasks, achieving a training efficiency improvement of up to 3.8 times and a data utilization efficiency increase of up to 5.9 times [14] - In experiments, the Spatial Forcing model achieved a success rate of 99.4% in spatial tasks, 99.6% in object tasks, and 98.8% in goal tasks, demonstrating its superior performance compared to other models [18]
从300多篇工作来看, VLA是否为通向通用具身智能的必经之路?
具身智能之心· 2025-10-17 16:02
Core Insights - The emergence of Vision Language Action (VLA) models signifies a shift from traditional strategy-based control to a paradigm of general robotic technology, transforming visual language models (VLM) from passive sequence generators to active agents capable of manipulation and decision-making in complex, dynamic environments [2] Group 1: VLA Overview - The article discusses a comprehensive survey on advanced VLA methods, providing a clear taxonomy and systematic review of existing research [2] - VLA methods are categorized into several main paradigms: autoregressive, diffusion-based, reinforcement-based, hybrid methods, and specialized methods, with detailed examination of their motivations, core strategies, and implementations [2] - The survey integrates insights from over 300 recent studies, outlining the opportunities and challenges that will shape the development of scalable, general VLA methods [2] Group 2: Future Directions and Challenges - The review addresses key challenges and future development directions to advance VLA models and generalizable robotic technologies [2] - The live discussion will explore the origins of VLA, its research subdivisions, and the hot topics and future trends in VLA [5] Group 3: Event Details - The live event is scheduled for October 18, from 19:30 to 20:30, focusing on VLA as a prominent research direction in artificial intelligence [5] - Key highlights of the event include the classification of VLA research fields, the integration of VLA with reinforcement learning, and the Sim2Real concept [6]
穹彻智能获阿里投资,加速具身智能全链路技术突破
具身智能之心· 2025-10-17 08:12
Core Viewpoint - Qunche Intelligent, led by Professor Lu Cewu, a leader in the field of embodied intelligence, combines academic excellence with industry experience, possessing full-stack capabilities from technology research and development to commercial delivery [1] Group 1: Company Overview - Qunche Intelligent focuses on a force-based embodied intelligence brain technology, breaking through traditional trajectory control frameworks [1] - The company has developed a comprehensive autonomous decision-making system covering perception, cognition, planning, and execution [1] - It leverages multimodal large models and a rich accumulation of force perception data to achieve high-dimensional understanding and flexible operation of the physical world [1] Group 2: Recent Developments - Recently, Qunche Intelligent announced the completion of a new round of financing, with Alibaba Group as the investor and several existing shareholders participating [1] - The new funding will be used to accelerate technology product development, implement embodied applications, and expand industry ecosystems [1]
独家|穹彻智能获阿里新一轮融资,上交教授卢策吾领衔,突破无本体数据采集,打通具身智能全链路​
具身智能之心· 2025-10-17 07:46
Core Insights - Qunche Intelligent recently completed a new round of financing led by Alibaba Group, with multiple existing shareholders participating. The funds will be used to accelerate technology product development, implement embodied applications, and expand industry ecosystems [2][4]. Group 1: Company Overview - Qunche Intelligent was established at the end of 2023 and has previously completed several rounds of financing totaling hundreds of millions in Pre-A++ and Pre-A+++ rounds [4]. - The company focuses on embodied intelligence technology, rapidly iterating its self-developed large models for the physical world and launched the upgraded product Noematrix Brain 2.0 this year [4][8]. Group 2: Technological Advancements - Qunche Intelligent has made significant breakthroughs in key technology areas, including a no-ontology data collection scheme, a universal end-to-end model scheme, and a large-scale deployment system for human-machine collaboration [4]. - The company aims to streamline the entire process from data collection to deployment, covering the complete technical chain from data acquisition, model pre-training to post-training [4]. Group 3: Market Position and Collaborations - Qunche has established partnerships with several leading companies in the retail and home sectors to promote the mass delivery of integrated hardware and software embodied intelligence solutions [6]. - The company plans to leverage its advanced large model products and data-to-model closed-loop capabilities to continuously provide innovative and practical embodied intelligence solutions to clients and partners [6]. Group 4: Leadership and Vision - Qunche Intelligent is led by Professor Lu Ce Wu, a prominent figure in the field of embodied intelligence, who possesses both academic depth and industry experience, enabling the company to have full-stack capabilities from technology research and development to commercial delivery [8]. - The company’s core technology is based on force-driven embodied intelligence, breaking through traditional trajectory control frameworks to build a comprehensive autonomous decision-making system that covers perception, cognition, planning, and execution [8].
VLA可以赋于强化学习更智能的场景应用......
具身智能之心· 2025-10-17 04:01
Core Insights - The article discusses the importance of reinforcement learning (RL) in the development of embodied intelligent robots, highlighting its applications in various complex tasks such as stair climbing, running, and dancing [3][9] - It emphasizes the challenges faced by newcomers in the field of reinforcement learning, particularly in producing quality research papers due to the complexity and breadth of the subject [6][10] - To address these challenges, a specialized 1v6 mentoring course in reinforcement learning has been introduced, aimed at helping students produce publishable research papers [7][10] Group 1: Reinforcement Learning Applications - Reinforcement learning is crucial for gait control in humanoid and quadruped robots, enabling them to perform tasks in challenging environments [3][9] - The VLA+RL approach for robotic arms is gaining popularity in academia, enhancing the efficiency and smoothness of robotic operations [4][9] Group 2: Course Structure and Objectives - The 1v6 mentoring course is designed for graduate students and others needing guidance on research papers, featuring weekly live sessions and dedicated teaching assistants [8][10] - The course spans 14 weeks of intensive online training followed by 8 weeks of maintenance support, focusing on various aspects of research paper production, including idea confirmation, project implementation, and writing refinement [10][18] Group 3: Course Content and Deliverables - The curriculum includes topics such as reinforcement learning fundamentals, simulation environments, and writing guidance, with a focus on producing a research paper suitable for top conferences and journals [10][19] - Students will receive structured templates and support for writing and submission processes, ensuring they meet the standards of leading academic publications [10][29] Group 4: Instructor and Support - The course is led by experienced instructors with backgrounds in embodied intelligence and robotics, providing both theoretical knowledge and practical insights [27] - Continuous support is offered through a dedicated WeChat group for real-time Q&A, enhancing the learning experience [18][27]
AIR科研|X-VLA重磅开源,全面刷新机器人基准性能记录
具身智能之心· 2025-10-17 00:04
Core Insights - The article discusses the launch of the X-VLA model, a groundbreaking open-source model for embodied intelligence, achieving a significant milestone by completing a 120-minute autonomous folding task with only 0.9 billion parameters, setting a new performance benchmark in the field [3][19]. Group 1: Performance Breakthrough - X-VLA is the first model to achieve a fully open-source solution for long-duration autonomous tasks, overcoming challenges in complex autonomous operations [8]. - The model demonstrates superior efficiency with only 0.9 billion parameters while achieving state-of-the-art (SOTA) performance across five authoritative simulation benchmarks [8][19]. Group 2: Innovative Technology - The model introduces a Soft-Prompt mechanism to address the variability in robot platforms, enhancing adaptability and training efficiency [10]. - A multi-modal encoding strategy is employed to optimize resource allocation while ensuring information integrity during task execution [10]. - The action generation module utilizes flow-matching for probabilistic modeling of robot action sequences, improving trajectory smoothness and robustness in uncertain environments [10]. Group 3: Data Pre-training and Quality - A balanced data sampling strategy is implemented to ensure equitable training across heterogeneous datasets, preventing model bias [12]. - A rigorous data preprocessing pipeline enhances the temporal consistency and overall quality of state-action sequences [12]. - The model's training is guided by a semantic-action alignment standard, ensuring that the learned behaviors are causally related rather than superficial associations [12]. Group 4: Training Process and Techniques - The pre-training phase exhibits a linear growth trend in performance as model parameters and training data scale up, validating the effectiveness of the Soft-Prompt mechanism [15]. - During fine-tuning, X-VLA shows high data efficiency, requiring only small-scale task-specific data to achieve SOTA performance [16]. - Adaptive learning rate adjustments and a progressive warm-up strategy are employed to optimize training stability and efficiency [17]. Group 5: Experimental Results - X-VLA has demonstrated strong performance in real robot platforms, successfully completing various tasks, including unlimited-duration autonomous folding, showcasing its capability in complex long-range tasks [19].