Workflow
具身智能之心
icon
Search documents
DemoGrasp:一次演示是怎么实现灵巧手通用抓取的?
具身智能之心· 2025-10-10 00:02
Core Insights - The article discusses DemoGrasp, a novel method for universal dexterous grasping that allows robots to learn grasping strategies from a single demonstration [2][3][6]. Group 1: Methodology - DemoGrasp utilizes a simple and efficient reinforcement learning framework that enables any dexterous hand to learn universal grasping strategies by collecting just one successful grasping demonstration [6]. - The method involves editing the trajectory of robot actions to adapt to new objects and poses, determining grasping positions and methods through adjustments in wrist and hand joint angles [2][3]. Group 2: Performance and Validation - In simulation experiments, DemoGrasp achieved a success rate of 95% when using the Shadow hand to manipulate objects from the DexGraspNet dataset, outperforming existing methods [2]. - The method demonstrated excellent transferability, achieving an average success rate of 84.6% on six unseen object datasets, despite being trained on only 175 objects [2]. Group 3: Applications and Capabilities - The strategy successfully grasped 110 previously unseen real-world objects, including small and thin items, and is adaptable to variations in spatial positioning, background, and lighting [3]. - DemoGrasp supports both RGB and depth input types and can be extended to language-guided grasping tasks in cluttered environments [3].
DexCanvas:具身数据的规模、真实、力觉真的突破不了三缺一吗?
具身智能之心· 2025-10-10 00:02
Core Viewpoint - The article discusses the challenges and advancements in dexterous manipulation in robotics, highlighting the need for high-quality, multi-modal data to improve robotic grasping capabilities and the introduction of the DexCanvas dataset as a solution [1][15]. Group 1: Challenges in Dexterous Manipulation - Dexterous manipulation remains a significant challenge due to the need for precise control, high-dimensional motion planning, and real-time adaptation to dynamic environments [2][11]. - Existing hardware for dexterous manipulation is categorized into two types: two-finger grippers and multi-finger humanoid hands, with the latter being more suitable for complex tasks due to their higher degrees of freedom [2][3]. - Current learning methods for dexterous manipulation include imitation learning and reinforcement learning, each with its own advantages and limitations regarding data requirements and training complexity [4][9]. Group 2: Data Collection and Quality Issues - Data collection for dexterous manipulation is expensive and often lacks tactile and force information, with existing datasets being insufficient for large-scale pre-training [9][10]. - The article emphasizes the trade-off in data collection, where achieving scale, realism, and tactile feedback simultaneously is challenging [6][7]. - The DexCanvas dataset addresses the lack of force and tactile information in existing datasets, providing a comprehensive solution for high-quality data collection [17][21]. Group 3: DexCanvas Dataset Introduction - DexCanvas is a large-scale dataset launched by Lingqiao Intelligent Technology, designed to bridge the gap between cognitive and physical intelligence in robotics [15][16]. - The dataset includes complete multi-finger force/contact annotations optimized for systems with over 20 degrees of freedom, significantly enhancing data quality [17][21]. - DexCanvas offers a structured framework for data collection based on 22 types of human hand operation modes, integrating over 1,000 hours of real human demonstration data and 100,000 hours of physically simulated data [21][22]. Group 4: Data Generation and Enhancement - The dataset generation process involves capturing human demonstrations with high precision and using physical simulation to recover missing force control data [25][27]. - DexCanvas expands the dataset by altering object properties and initial conditions, resulting in a significant increase in data volume while maintaining force control information [28][29]. - Unlike pure simulation, DexCanvas is based on real human demonstrations, allowing for better generalization across different robotic platforms and tasks [30]. Group 5: Industry Impact and Future Prospects - The introduction of DexCanvas is expected to accelerate advancements in the field of robotics by providing essential data for physical interaction, which has been lacking in existing datasets [32]. - The article expresses anticipation for the open-sourcing of the dataset to further enhance research and development in related areas [32].
Qwen终于要做机器人了:林俊旸官宣成立具身团队!
具身智能之心· 2025-10-09 06:39
Core Insights - Qwen, a leading open-source model, is transitioning into robotics by forming a dedicated team for embodied intelligence, indicating a shift from virtual to physical applications [1][7] - The establishment of this team aligns with Alibaba Cloud's broader strategy to support the embodied intelligence sector, which is gaining traction among global tech giants [7][11] Group 1: Qwen's Development and Market Position - Qwen's internal team formation for robotics is a significant step towards applying their models in real-world scenarios, enhancing their capabilities in perception, planning, and execution [7] - The Qwen series models, particularly Qwen-VL, are being widely adopted by over 30 companies for their strengths in spatial understanding and long-context memory, making them a preferred foundational model in the embodied intelligence field [5][7] - The recent launch of Qwen3-VL has optimized capabilities for fine-grained visual understanding and 3D perception, further solidifying its role in supporting embodied intelligence applications [5][7] Group 2: Industry Trends and Investments - The robotics sector is experiencing significant investment, with SoftBank's recent $5.4 billion acquisition of ABB's robotics business highlighting strategic moves in the "physical AI" domain [9][10] - Citigroup projects that the global robotics market could reach $7 trillion by 2050, attracting substantial capital from various sources, including government funds [11] - The integration of generative AI with robotics is expected to fundamentally change human-machine interaction, with major companies like NVIDIA identifying this as a core growth opportunity [8][11]
新手如何挑选自己的第一套具身科研平台?
具身智能之心· 2025-10-09 04:00
Core Viewpoint - Imeta-Y1 is a lightweight, cost-effective robotic arm designed specifically for beginners and researchers in the field of embodied intelligence, enabling low-cost and efficient algorithm validation and project development [2][5]. Group 1: Product Features - The robotic arm offers a complete open-source toolchain and code examples, facilitating a seamless process from data collection to model deployment [3][17]. - It supports dual-language interfaces in Python and C++, allowing users to quickly get started regardless of their programming background [3][18]. - Compatibility with ROS1 and ROS2 is provided, along with URDF models for smooth transitions between simulation and real-world applications [3][19]. - The arm features high-precision motion control, low power consumption, and an open hardware architecture, supporting seamless integration from simulation to real machine [5][6]. Group 2: Technical Specifications - The robotic arm has a weight of 4.2 kg, a rated load of 3 kg, and 6 degrees of freedom, with a working radius of 612.5 mm and a repeatability precision of ±0.1 mm [8][19]. - It operates at a supply voltage of 24V and communicates via CAN, with external interfaces for power and CAN connections [8][19]. - The arm's joint motion range and maximum speeds are specified, ensuring versatility in various applications [8][19]. Group 3: Development and Support - The company provides a comprehensive open-source SDK, including drivers, API interfaces, example codes, and documentation, supporting rapid application development [26][29]. - A full-process toolchain is available for data collection, model training, and inference deployment, compatible with mainstream frameworks like TensorFlow and PyTorch [29][32]. - The company ensures timely after-sales support, with a 24-hour response guarantee for any issues encountered by users [3][19][44].
中科院自动化!EmbodiedCoder:生成模型的参数化具身移动操作
具身智能之心· 2025-10-09 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Zefu Lin等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 一、研究背景 在机器人领域,让机器人在复杂、非结构化环境中像人类一样熟练完成多样化任务,是长期核心目标。近年来,视觉-语言-动作(VLA)模型通过端到端映射感官 输入与自然语言指令到机器人动作,推动了这一目标的落地,但仍存在显著局限: 为解决这些问题,研究人员提出分层策略,利用视觉-语言模型(VLM)将任务分解为子任务,并调用预定义操纵原语(如导航、抓取)。但这类方法受限于原语 库,无法处理开门、拉抽屉等需要精细交互的真实场景任务——这类任务难以被有限的预定义原语覆盖。 此前基于代码生成的尝试也存在不足:早期方法仅适用于简单几何任务;部分方法依赖学习模型处理物理约束,降低对新场景的适应性;还有方法无法处理接触密 集型操纵,或仅聚焦于故障检测而非扩展操纵能力。针对移动机器人,还需解决环境信息留存、非视野内物体规划等更复杂的 ...
从机械臂到人形,跨构型VLA如何破局?
具身智能之心· 2025-10-09 00:04
Core Insights - The article discusses two significant advancements in the field of embodied intelligence and VLA (Vision-Language Action) models, highlighting their potential to overcome existing challenges in the domain [3][7]. Group 1: VLA-Adapter - VLA-Adapter aims to improve the direct mapping from VLM (Vision-Language Model) features to action space without heavily relying on robotic data. The research team found that increasing the parameter count and introducing pre-trained robotic data did not significantly enhance model performance on general benchmarks [3]. - The new mapping scheme proposed by the team allows the model to achieve superior performance even at a 0.5 billion parameter scale, reducing training costs and lowering the entry barrier for VLA models [3]. Group 2: TrajBooster - TrajBooster is the first full-body humanoid operation VLA solution that addresses data scarcity issues for training VLA models in bipedal humanoid tasks. The scarcity arises from the high cost of remote operation data and the challenges of using existing heterogeneous robot data for training [7]. - By focusing on trajectory-centered methods, TrajBooster efficiently utilizes cross-body data, achieving full-body operation in bipedal robots with just 10 minutes of real machine remote operation data for fine-tuning [7]. Group 3: Contributors - Wang Yihao, a fourth-year PhD student at Beijing University of Posts and Telecommunications, is involved in the VLA-Adapter project and has contributed significantly to the field of embodied intelligence and VLA models [13]. - Liu Jiacheng, a second-year PhD student at Zhejiang University and West Lake University, leads the TrajBooster project, which is the only fully open-source work covering humanoid data collection, cross-body data enhancement, VLA model training, and hardware deployment [13].
DiffusionNFT:扩散强化学习新范式,训练效率提升25倍
具身智能之心· 2025-10-09 00:04
编辑丨 机器之心 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 清华大学朱军教授团队, NVIDIA Deep Imagination 研究组与斯坦福 Stefano Ermon 团队联合提出了一种全新的扩散模型强化学习(RL)范式 —— Diffusion Negative-aware FineTuning (DiffusionNFT) 。该方法首次突破现有 RL 对扩散模型的基本假设,直接在 前向加噪过程(forward process) 上进行优化,在彻底摆 脱似然估计与特定采样器依赖的同时,显著提升了训练效率与生成质量。文章共同一作郑凯文和陈华玉为清华大学计算机系博士生。 论文标题:DiffusionNFT: Online Diffusion Reinforcement with Forward Process 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 论文链接:https://arxiv.org/abs/2509.16117 代码仓库:https://github.com/NVla ...
我们正在找具身领域的合伙人......
具身智能之心· 2025-10-08 02:49
Core Viewpoint - The company is seeking collaboration with global practitioners in the embodied intelligence field to enhance capabilities in various areas such as technical services, training, course development, and research guidance [1]. Group 1: Collaboration Opportunities - There is an increasing demand from partners and small companies for the company to empower them through solutions, data collection, technology upgrades, and corporate training [1]. - The company is inviting outstanding partners to join in driving significant industry progress [1]. Group 2: Compensation and Resources - The company will offer high compensation and abundant industry resources to collaborators [2]. Group 3: Focus Areas - Key focus areas for collaboration include but are not limited to: VLA, VLN, Diffusion Policy, Reinforcement Learning, VLA+RL, remote operation, motion capture, sim2real, multimodal large models, simulation, motion control, end-to-end systems, and 3D perception [3]. Group 4: Job Description - The positions are primarily aimed at embodied course development, solution research and development, hardware development, and training collaboration, targeting both B-end (enterprises, universities, research institutes) and C-end (students, job seekers) [4]. Group 5: Contact Information - Interested parties can add WeChat oooops-life for further inquiries [5].
盘点下国内外那些做具身感知的公司们!
具身智能之心· 2025-10-08 02:49
Core Insights - The article focuses on the emerging field of embodied intelligence, highlighting the development of general-purpose robotic brain systems and multi-modal perception decision-making systems, which are attracting significant attention from both capital and industry [2][3]. Domestic Companies - **Xinghai Map**: Founded in 2023, focuses on developing a "general embodied large model" using real-world data to create robots with fine operational capabilities. The company has completed 8 rounds of financing [6]. - **WALL-A Model**: Set to launch in October 2024, it will be the largest parameter scale embodied intelligence general operation model globally, integrating visual, language, and motion control signals [6]. - **Wall-OSS**: An open-source embodied intelligence foundational model with strong generalization and reasoning capabilities [6]. - **UBTECH**: Established in 2012, it is a leader in humanoid robot commercialization with comprehensive self-research capabilities [10]. - **Thinker Model**: A multi-modal large model with 10 billion parameters, expected to achieve top rankings in three international benchmark tests by 2025, enhancing robots' perception and task planning in complex environments [10]. - **Zhiyuan Robotics**: Founded in February 2023, it aims to create world-class general embodied intelligent robot products [12]. - **Genie Operator-1**: Set to release in March 2025, it integrates multi-modal large models and hybrid expert technology, improving task success rates by 32% compared to market models [12]. - **Galaxy General**: Founded in May 2023, it focuses on multi-modal large models driven by synthetic data [14]. - **VLA Model**: The world's first general embodied large model, utilizing a "brain + cerebellum" collaborative framework [14]. - **Qianxun Intelligent**: Established in 2024, it specializes in AI and robotics with a strong technical foundation [16]. - **Spirit V1 VLA Model**: The first AI model to tackle long-range operations of flexible objects, supporting multi-task generalization [16]. - **Star Motion Era**: A new tech company incubated by Tsinghua University, focusing on general artificial intelligence applications [18]. - **ERA-42 Model**: The first end-to-end native embodied large model in China, capable of learning over 100 dynamic tasks through video training [18]. International Companies - **Figure AI**: Focuses on developing embodied intelligence large models and related infrastructure for various industries [20]. - **Noematrix Brain**: Combines advanced algorithms and data support for comprehensive capabilities in instruction reasoning and task planning [20]. - **Physical Intelligence**: A startup established in January 2023, aims to create advanced intelligent software for robots [24]. - **π0 Model**: Released on October 31, 2024, it is a foundational model for robots, achieving fine control capabilities through pre-training and fine-tuning [24]. - **Google DeepMind**: Merged with Google Brain in 2023, focusing on general artificial intelligence research [22]. - **Gemini Robotics**: A VLA model that allows robots to perform complex tasks without specialized training, enhancing their adaptability to environmental changes [22]. - **NVIDIA**: A leading GPU design company that has expanded into AI solutions [24]. - **Eureka System**: Based on GPT-4, it can automatically train robots for complex actions and optimize reinforcement learning processes [24].
VLA的基础模型与大规模训练任务汇总
具身智能之心· 2025-10-08 02:49
Core Insights - The article summarizes several research papers related to Vision-Language-Action (VLA) models and their training strategies, highlighting advancements in embodied intelligence and robotics [2][3][5][7][9][11][13][15][17][19]. Group 1: Training Strategies and Model Improvements - The paper "Training strategies for efficient embodied reasoning" discusses the use of Chain of Thought (CoT) reasoning to enhance the performance and generalization of VLA models, achieving a threefold increase in reasoning speed compared to standard methods [3]. - "CAST: Counterfactual labels improve instruction following in vision-language-action models" introduces a method to generate counterfactual labels, which significantly improves the instruction-following capabilities of VLA models, with a 27% increase in navigation task success rates [5]. - "RoboBrain: A unified brain model for robotic manipulation" presents a new dataset, ShareRobot, which enhances the planning and trajectory prediction capabilities of robots, leading to state-of-the-art performance in various tasks [7]. Group 2: Dataset Development and Evaluation - The "DROID" dataset is introduced as a large-scale, diverse dataset for robot manipulation, containing 76,000 demonstration trajectories collected over 350 hours, which improves performance and generalization of trained strategies [9]. - "ViSA-Flow" proposes a framework for learning from large-scale video data, achieving state-of-the-art performance in robot skill learning, particularly in low-data scenarios [11]. - The "CORTEXBENCH" benchmark evaluates pre-trained visual representations for embodied AI, revealing that no single representation excels across all tasks, but task-specific adaptations can lead to significant performance improvements [13]. Group 3: Generalist Robot Policies and Learning Frameworks - "Effective tuning strategies for generalist robot manipulation policies" identifies key factors influencing the performance of Generalist Manipulation Policies (GMPs) during fine-tuning, establishing a new benchmark for future research [15]. - The "CACTI" framework focuses on scalable multi-task learning in robotic systems, demonstrating effective training across various kitchen tasks in both real and simulated environments [17]. - "R3m: A universal visual representation for robot manipulation" shows that pre-trained visual representations can enhance data-efficient learning in real-world environments, improving task success rates by over 20% compared to training from scratch [19].