Workflow
具身智能之心
icon
Search documents
DexCanvas:具身数据的规模、真实、力觉真的突破不了三缺一吗?
具身智能之心· 2025-10-10 00:02
灵巧抓取为什么这么难? 近两年,具身领域在认知、感知和规划层面取得了显著进展,但让机器人在物理世界中实现精细手部操控、像人类一样执行复杂的灵巧操作, 仍是非常大的难题。目前具身领域已经突破了人类语言理解、物体和场景识别、规划具体任务步骤,但在灵活抓握、感知调节力度等方向还存 在很多问题。 真实场景中,灵巧抓取会面临精确控制、高维运动规划和实时适应动态环境等挑战,任务复杂性要求强大的机械设计和先进控制算法。 而灵巧操作背后的硬件主要是灵巧手,又可以分为两类:两指夹爪和多指拟人化手。两指夹具因其可靠性、简单性和易于控制而被广泛使用。 但这类硬件通常只有一个自由度,很难适配一些复杂任务。为此,类人的具备20+自由度的灵巧手应允而生。这些拟人化手更适合与为人类设计 的物体和环境进行交互。 1)现有灵巧抓取与数据采集方案 虽然国内外各大机器人公司都在发布海量数据集:百万级轨迹、千小时演示,但却缺乏相关力控信息。灵巧手数据好像一直脱离不开这样的定 律:scale、真实、力觉只能三选二。 数据获取方式决定了不能既要、又要、还要! 目前灵巧抓取的学习方法主要分为2类:强化学习和模仿学习。 模仿学习无需构建复杂世界模型和设计奖 ...
Qwen终于要做机器人了:林俊旸官宣成立具身团队!
具身智能之心· 2025-10-09 06:39
Core Insights - Qwen, a leading open-source model, is transitioning into robotics by forming a dedicated team for embodied intelligence, indicating a shift from virtual to physical applications [1][7] - The establishment of this team aligns with Alibaba Cloud's broader strategy to support the embodied intelligence sector, which is gaining traction among global tech giants [7][11] Group 1: Qwen's Development and Market Position - Qwen's internal team formation for robotics is a significant step towards applying their models in real-world scenarios, enhancing their capabilities in perception, planning, and execution [7] - The Qwen series models, particularly Qwen-VL, are being widely adopted by over 30 companies for their strengths in spatial understanding and long-context memory, making them a preferred foundational model in the embodied intelligence field [5][7] - The recent launch of Qwen3-VL has optimized capabilities for fine-grained visual understanding and 3D perception, further solidifying its role in supporting embodied intelligence applications [5][7] Group 2: Industry Trends and Investments - The robotics sector is experiencing significant investment, with SoftBank's recent $5.4 billion acquisition of ABB's robotics business highlighting strategic moves in the "physical AI" domain [9][10] - Citigroup projects that the global robotics market could reach $7 trillion by 2050, attracting substantial capital from various sources, including government funds [11] - The integration of generative AI with robotics is expected to fundamentally change human-machine interaction, with major companies like NVIDIA identifying this as a core growth opportunity [8][11]
新手如何挑选自己的第一套具身科研平台?
具身智能之心· 2025-10-09 04:00
面向具身科研领域打造的轻量级高性价比机械臂 还在为具身智能领域的硬件选择发愁吗? 太贵的机械臂买不起,太便宜的又难用、难上手? 别担心,Imeta-Y1 来了——这是一款专为新手和科研初学者设计的轻量级高性价比机械臂。 无论你是学生、教育工作者,还是刚踏入机器人领域的开发者,Imeta-Y1 都能帮你低成本、高效率地完 成算法验证与项目开发。 对小白尤其友好的是: ✅ 提供全流程开源工具链+代码示例,从数据采集到模型部署一气呵成; ✅ 支持 Python / C++ 双语言接口,无论你擅长哪种语言都能快速上手; ✅ 兼容 ROS1 / ROS2,并提供 URDF 模型,仿真与真机无缝切换; ✅ 24小时快速售后响应,遇到问题不卡壳,学习路上有保障! 该机械臂融合高精度运动控制、低功耗设计与开放软硬件架构,支持从仿真到真机的无缝联调,并提供全 流程开源SDK与工具链,助力用户快速实现算法验证、数据采集、模型训练与部署应用。 其紧凑型结构与模块化接口,尤其适用于嵌入式AI与机器人学习平台的开发与应用推广。 | 本体重量 | 4.2KG | 额定负载 | 3KG | 自由度 | 6 | | --- | --- | ...
中科院自动化!EmbodiedCoder:生成模型的参数化具身移动操作
具身智能之心· 2025-10-09 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Zefu Lin等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 一、研究背景 在机器人领域,让机器人在复杂、非结构化环境中像人类一样熟练完成多样化任务,是长期核心目标。近年来,视觉-语言-动作(VLA)模型通过端到端映射感官 输入与自然语言指令到机器人动作,推动了这一目标的落地,但仍存在显著局限: 为解决这些问题,研究人员提出分层策略,利用视觉-语言模型(VLM)将任务分解为子任务,并调用预定义操纵原语(如导航、抓取)。但这类方法受限于原语 库,无法处理开门、拉抽屉等需要精细交互的真实场景任务——这类任务难以被有限的预定义原语覆盖。 此前基于代码生成的尝试也存在不足:早期方法仅适用于简单几何任务;部分方法依赖学习模型处理物理约束,降低对新场景的适应性;还有方法无法处理接触密 集型操纵,或仅聚焦于故障检测而非扩展操纵能力。针对移动机器人,还需解决环境信息留存、非视野内物体规划等更复杂的 ...
从机械臂到人形,跨构型VLA如何破局?
具身智能之心· 2025-10-09 00:04
Core Insights - The article discusses two significant advancements in the field of embodied intelligence and VLA (Vision-Language Action) models, highlighting their potential to overcome existing challenges in the domain [3][7]. Group 1: VLA-Adapter - VLA-Adapter aims to improve the direct mapping from VLM (Vision-Language Model) features to action space without heavily relying on robotic data. The research team found that increasing the parameter count and introducing pre-trained robotic data did not significantly enhance model performance on general benchmarks [3]. - The new mapping scheme proposed by the team allows the model to achieve superior performance even at a 0.5 billion parameter scale, reducing training costs and lowering the entry barrier for VLA models [3]. Group 2: TrajBooster - TrajBooster is the first full-body humanoid operation VLA solution that addresses data scarcity issues for training VLA models in bipedal humanoid tasks. The scarcity arises from the high cost of remote operation data and the challenges of using existing heterogeneous robot data for training [7]. - By focusing on trajectory-centered methods, TrajBooster efficiently utilizes cross-body data, achieving full-body operation in bipedal robots with just 10 minutes of real machine remote operation data for fine-tuning [7]. Group 3: Contributors - Wang Yihao, a fourth-year PhD student at Beijing University of Posts and Telecommunications, is involved in the VLA-Adapter project and has contributed significantly to the field of embodied intelligence and VLA models [13]. - Liu Jiacheng, a second-year PhD student at Zhejiang University and West Lake University, leads the TrajBooster project, which is the only fully open-source work covering humanoid data collection, cross-body data enhancement, VLA model training, and hardware deployment [13].
DiffusionNFT:扩散强化学习新范式,训练效率提升25倍
具身智能之心· 2025-10-09 00:04
编辑丨 机器之心 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 清华大学朱军教授团队, NVIDIA Deep Imagination 研究组与斯坦福 Stefano Ermon 团队联合提出了一种全新的扩散模型强化学习(RL)范式 —— Diffusion Negative-aware FineTuning (DiffusionNFT) 。该方法首次突破现有 RL 对扩散模型的基本假设,直接在 前向加噪过程(forward process) 上进行优化,在彻底摆 脱似然估计与特定采样器依赖的同时,显著提升了训练效率与生成质量。文章共同一作郑凯文和陈华玉为清华大学计算机系博士生。 论文标题:DiffusionNFT: Online Diffusion Reinforcement with Forward Process 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 论文链接:https://arxiv.org/abs/2509.16117 代码仓库:https://github.com/NVla ...
我们正在找具身领域的合伙人......
具身智能之心· 2025-10-08 02:49
Core Viewpoint - The company is seeking collaboration with global practitioners in the embodied intelligence field to enhance capabilities in various areas such as technical services, training, course development, and research guidance [1]. Group 1: Collaboration Opportunities - There is an increasing demand from partners and small companies for the company to empower them through solutions, data collection, technology upgrades, and corporate training [1]. - The company is inviting outstanding partners to join in driving significant industry progress [1]. Group 2: Compensation and Resources - The company will offer high compensation and abundant industry resources to collaborators [2]. Group 3: Focus Areas - Key focus areas for collaboration include but are not limited to: VLA, VLN, Diffusion Policy, Reinforcement Learning, VLA+RL, remote operation, motion capture, sim2real, multimodal large models, simulation, motion control, end-to-end systems, and 3D perception [3]. Group 4: Job Description - The positions are primarily aimed at embodied course development, solution research and development, hardware development, and training collaboration, targeting both B-end (enterprises, universities, research institutes) and C-end (students, job seekers) [4]. Group 5: Contact Information - Interested parties can add WeChat oooops-life for further inquiries [5].
盘点下国内外那些做具身感知的公司们!
具身智能之心· 2025-10-08 02:49
Core Insights - The article focuses on the emerging field of embodied intelligence, highlighting the development of general-purpose robotic brain systems and multi-modal perception decision-making systems, which are attracting significant attention from both capital and industry [2][3]. Domestic Companies - **Xinghai Map**: Founded in 2023, focuses on developing a "general embodied large model" using real-world data to create robots with fine operational capabilities. The company has completed 8 rounds of financing [6]. - **WALL-A Model**: Set to launch in October 2024, it will be the largest parameter scale embodied intelligence general operation model globally, integrating visual, language, and motion control signals [6]. - **Wall-OSS**: An open-source embodied intelligence foundational model with strong generalization and reasoning capabilities [6]. - **UBTECH**: Established in 2012, it is a leader in humanoid robot commercialization with comprehensive self-research capabilities [10]. - **Thinker Model**: A multi-modal large model with 10 billion parameters, expected to achieve top rankings in three international benchmark tests by 2025, enhancing robots' perception and task planning in complex environments [10]. - **Zhiyuan Robotics**: Founded in February 2023, it aims to create world-class general embodied intelligent robot products [12]. - **Genie Operator-1**: Set to release in March 2025, it integrates multi-modal large models and hybrid expert technology, improving task success rates by 32% compared to market models [12]. - **Galaxy General**: Founded in May 2023, it focuses on multi-modal large models driven by synthetic data [14]. - **VLA Model**: The world's first general embodied large model, utilizing a "brain + cerebellum" collaborative framework [14]. - **Qianxun Intelligent**: Established in 2024, it specializes in AI and robotics with a strong technical foundation [16]. - **Spirit V1 VLA Model**: The first AI model to tackle long-range operations of flexible objects, supporting multi-task generalization [16]. - **Star Motion Era**: A new tech company incubated by Tsinghua University, focusing on general artificial intelligence applications [18]. - **ERA-42 Model**: The first end-to-end native embodied large model in China, capable of learning over 100 dynamic tasks through video training [18]. International Companies - **Figure AI**: Focuses on developing embodied intelligence large models and related infrastructure for various industries [20]. - **Noematrix Brain**: Combines advanced algorithms and data support for comprehensive capabilities in instruction reasoning and task planning [20]. - **Physical Intelligence**: A startup established in January 2023, aims to create advanced intelligent software for robots [24]. - **π0 Model**: Released on October 31, 2024, it is a foundational model for robots, achieving fine control capabilities through pre-training and fine-tuning [24]. - **Google DeepMind**: Merged with Google Brain in 2023, focusing on general artificial intelligence research [22]. - **Gemini Robotics**: A VLA model that allows robots to perform complex tasks without specialized training, enhancing their adaptability to environmental changes [22]. - **NVIDIA**: A leading GPU design company that has expanded into AI solutions [24]. - **Eureka System**: Based on GPT-4, it can automatically train robots for complex actions and optimize reinforcement learning processes [24].
VLA的基础模型与大规模训练任务汇总
具身智能之心· 2025-10-08 02:49
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 今天为大家汇总一下有关 VLA的基础模型与大规模训练任务相关的几篇文章,按照发表年份来排序,后续持续更新中....... Training strategies for efficient embodied reasoning 论文时间:2025 论文链接:https://arxiv.org/abs/2505.08243 RoboBrain: A unified brain model for robotic manipulation from abstract to concrete 论文时间:2025 论文链接:https://arxiv.org/abs/2502.21257 近年来,多模态大语言模型在多模态上下文处理中展现出卓越的能力。然而,它们在机器人场景中的应用,特别是对于长周期操作任务,显示出显著的局限性。这 些局限性源于当前MLLMs ...
面试的时候,问到了具身的大小脑算法是什么......
具身智能之心· 2025-10-08 02:49
Core Insights - The article discusses the evolution and current state of embodied intelligence, focusing on the roles of the brain and cerebellum in robotics, where the brain handles perception and planning, while the cerebellum is responsible for execution [3][10]. Technical Evolution - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection to behavior cloning, and now to diffusion policy and VLA models, indicating a shift from low-level perception to high-level understanding and generalization [7][10]. - The first stage focused on grasp pose detection using point clouds or images for static object manipulation, but lacked context modeling for complex tasks [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations, but faced challenges in generalization and performance in multi-target scenarios [7]. - The third stage, emerging in 2023, introduced diffusion policy methods that enhance stability and generalization by modeling action sequences [8]. - The fourth stage, anticipated in 2024, emphasizes the integration of VLA models with reinforcement learning and world models, enhancing robots' predictive capabilities and multi-modal perception [9][10]. Current Trends and Applications - The integration of VLA with reinforcement learning improves robots' trial-and-error capabilities and self-improvement in long-term tasks, while the combination with world models allows for future prediction and better planning [10]. - The industry is witnessing a surge in products related to humanoid robots, robotic arms, and quadrupedal robots, serving various sectors such as industrial, home, dining, and medical rehabilitation [10]. - There is a growing demand for engineering capabilities as embodied intelligence transitions from research to deployment, necessitating skills in simulation and strategy training [14]. Educational Initiatives - The article outlines a structured curriculum aimed at providing comprehensive knowledge of embodied intelligence algorithms, catering to both beginners and advanced learners [11][20]. - The course includes practical applications and supervision to enhance learning outcomes, focusing on various modules such as diffusion policy, VLA, and tactile sensing [11][14].