Workflow
具身智能之心
icon
Search documents
这个具身智能领域的黄埔军校,正在做这些事情......
具身智能之心· 2025-09-26 10:42
Core Viewpoint - The article emphasizes the development and enhancement of a community focused on embodied intelligence, aiming to provide resources, support, and networking opportunities for individuals interested in this field. Group 1: Community Development - The community is working on improving hardware solutions and addressing user feedback regarding product quality and pricing [2] - There is an ongoing effort to streamline community resources and reduce gaps in information, with plans to present improved content after the holiday [2] - The community has established a mechanism for job referrals, connecting members with potential employers in the embodied intelligence sector [5][13] Group 2: Educational Resources - The community has compiled over 30 technical routes for members, facilitating easier access to benchmarks, reviews, and learning paths [6] - Various forums and live sessions are organized to discuss advancements in the embodied intelligence industry, covering topics such as robot simulation and data collection [6] - A comprehensive list of open-source projects and datasets related to embodied intelligence is available, aiding members in their research and development efforts [29][35] Group 3: Networking and Collaboration - The community includes members from prestigious universities and leading companies in the field, fostering collaboration and knowledge sharing [13] - Members are encouraged to engage with industry leaders and peers to discuss job opportunities and academic advancements [17][80] - The community aims to cultivate future leaders in the industry by providing a platform for technical exchange and professional growth [12]
好用,高性价比!面向具身科研领域打造的轻量级机械臂
具身智能之心· 2025-09-26 02:24
面向具身科研领域打造的轻量级高性价比机械臂 还在为具身领域的硬件发愁吗?太贵的硬件买不起,太便宜的机械臂不好用,有没有一款价格低但质量很 高的产品? Imeta-y1来了!低成本可以完成具身领域论文的验证,科研场景的开发,满足大多数从业人员和科研工作者 的需求。 这是一款专为教育、科研与轻工业场景设计的轻量级机械臂。 该机械臂融合高精度运动控制、低功耗设计与开放软硬件架构,支持从仿真到真机的无缝联调,并提供全 流程开源SDK与工具链,助力用户快速实现算法验证、数据采集、模型训练与部署应用。 其紧凑型结构与模块化接口,尤其适用于嵌入式AI与机器人学习平台的开发与应用推广。 | 重量 | 631g | | --- | --- | | 尺寸 | 见附页 | | 行程 | 0~80mm | | 定位精度 | 40.5mm | | 外部接口 | 电源+CAN XT30 2+2 | | 重量 | 671g | | --- | --- | | 尺寸 | 见附页 | | 行程 | 0-80mm | | 定位精度 | 40.5mm | | 外部接口 | 电源+CAN XT30 2+2 | | 本体重量 | 4.2KG | 额定 ...
VLA这个方向的论文产出,是真的多......
具身智能之心· 2025-09-26 00:04
想象一下,如果能通过语言下达指令,并且丝滑执行任何你想要的动作,是一件多么幸福的事情!如果能长时 间连续动作完成,将会非常方便。下面给大家介绍下VLA到底是啥? VLA打破了传统方法的单任务局限,使得机器人能够在多样化的场景中自主决策,灵活应对未见过的环境, 广泛应用于制造业、物流和家庭服务等领域。此外,VLA模型已成为研究热点,推动了多个前沿项目的发 展,如pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA,这些研究促进了学术界与工业界的合作。其适应性 体现在能够应用于机械臂、四足机器人和人形机器人等多种平台,为各类智能机器人的发展提供了广泛的潜力 和实际应用价值,成为智能机器人领域的关键驱动力。 从今年各个机器人与AI顶会来看,VLA及其相关衍生方向,占据了近一半的具身产出。特别是长程操作、泛 化、少样本、VLA+RL、人形相关。 从产业角度看,国内外具身智能领域正处于蓬勃发展阶段,Unitree、智元、星海图、银河通用、逐际动力等团 队从实验室走向商业化,华为、京东、腾讯等科技巨头也积极布局,与国外Tesla、Figure AI等公司正在一起 推动这一领域的发展。 很多同学后台留言,咨 ...
RoboDexVLM:基于VLM分层架构的通用灵巧机器人操作
具身智能之心· 2025-09-26 00:04
Core Insights - RoboDexVLM is an innovative robot task planning and grasp detection framework designed for collaborative robotic arms equipped with dexterous hands, focusing on complex long-sequence tasks and diverse object manipulation [2][6] Group 1: Framework Overview - The framework utilizes a robust task planner with a task-level recovery mechanism, leveraging visual language models to interpret and execute open vocabulary instructions for completing long-sequence tasks [2][6] - It introduces a language-guided dexterous grasp perception algorithm, specifically designed for zero-shot dexterous manipulation of diverse objects and instructions [2][6] - Comprehensive experimental results validate RoboDexVLM's effectiveness, adaptability, and robustness in handling long-sequence scenarios and executing dexterous grasping tasks [2][6] Group 2: Key Features - The framework allows robots to understand natural language commands, enabling seamless human-robot interaction [7] - It supports zero-shot grasping of various objects, showcasing the dexterous hand's capability to manipulate items of different shapes and sizes [7] - The visual language model acts as the "brain" for long-range task planning, ensuring that the robot does not lose track of its objectives [7] Group 3: Practical Applications - RoboDexVLM represents the first general-purpose dexterous robot operation framework that integrates visual language models, breaking through the limitations of traditional and end-to-end methods [6][7] - The framework's real-world performance demonstrates its potential in embodied intelligence and human-robot collaboration [6][7]
RoboSeek破解了长时程任务的核心难题,当操作任务遇上 “具身认知”
具身智能之心· 2025-09-26 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 正是在这一技术困境下, RoboSeek 框架 的提出带来了突破性思路,其创新核心源于对 "具身认知理论" 的深度落地 —— 该理论颠覆了 "认知孤立于交互" 的传 统认知,强调智能体的认知能力源于与物体、环境的动态交互。基于这一理念,RoboSeek 构建了 "交互驱动感知与动作联合优化" 的全新架构:通过动态进化的 "注意力空间" 捕捉任务关键信息,用强化学习驱动的具身执行器实现精准动作控制,再借助交叉熵方法迭代优化关键目标,最后通过 "从现实到仿真再到现 实"(real2sim2real)的迁移流程,破解仿真与现实脱节的难题。 这一创新设计的价值不言而喻:在 8 项长时程任务、2 类不同机器人平台的测试中,RoboSeek 实现了 79% 的平均成功率,远超传统基线方法(成功率均低于 50%),不仅为长时程机器人操作提供了稳定可靠的解决方案,更填补了 "具身认知理论" 到 "机器人实际操作" 的落地空白,为通用机器人在真实环境中的应用 开辟了 ...
首个代码世界模型引爆AI圈,能让智能体学会「真推理」,Meta开源
具身智能之心· 2025-09-26 00:04
Core Viewpoint - The article discusses the introduction of the Code World Model (CWM) by Meta, which represents a significant evolution in AI models aimed at improving code generation through world modeling techniques [2][5][31]. Group 1: Model Overview - CWM is a 32 billion parameter open-weight large language model (LLM) designed to enhance code generation research based on world models [7][12]. - It features a dense, decoder-only architecture with a context length of up to 131k tokens, demonstrating strong performance in general programming and mathematical tasks [8][9]. Group 2: Performance Metrics - CWM achieved notable scores in various benchmarks: SWE-bench Verified (pass@1 65.8%), LiveCodeBench (68.6%), Math-500 (96.6%), and AIME 2024 (76.0%) [8][23]. - In comparison to other models, CWM's performance is competitive, particularly in the 30B parameter range [9][30]. Group 3: Training Methodology - The model was trained using extensive observational-action trajectories in a Python interpreter and agent-based Docker environment, focusing on improving code understanding beyond static code training [12][22]. - Meta has made available checkpoints from the mid-training, SFT, and reinforcement learning phases to support further research [13]. Group 4: Research Implications - CWM serves as a robust testing platform to explore the potential of world modeling in enhancing reasoning and planning capabilities in code generation [15][31]. - The research indicates that world models can benefit agent-based coding by allowing for stepwise simulation of Python code execution, which enhances reasoning from such simulations [16][31]. Group 5: Future Directions - Meta envisions that the code world model will bridge the gap between linguistic reasoning and executable semantics, with ongoing research needed to fully leverage its advantages across tasks [31]. - The model aims to improve reinforcement learning by enabling agents familiar with environmental dynamics to focus on learning actions that yield rewards [31].
CoRL 2025最新工作!ControlVLA:机器人看10遍就会,“通智大脑”能力再升级!
具身智能之心· 2025-09-25 09:54
Core Insights - The article discusses the development of ControlVLA, a novel framework that allows robots to learn complex tasks with minimal human demonstrations, achieving a success rate exceeding 75%, which is nearly four times higher than traditional methods [1][10][15]. Group 1: Research Background - Robots face significant challenges in performing tasks in real-world scenarios, especially with limited demonstrations. Existing few-shot learning methods often rely on simulation-enhanced data or pre-built modules, which struggle with the gap between simulation and reality [7][8]. - Recent advancements in Vision-Language-Action (VLA) models show promise in enhancing robot performance across multiple tasks and environments, but adapting these models efficiently to specific tasks in data-scarce situations remains a challenge [8][9]. Group 2: ControlVLA Framework - ControlVLA integrates pre-trained VLA models with object-centric representations to facilitate efficient few-shot fine-tuning for robot operation tasks. The framework employs a ControlNet-style architecture to maintain the rich prior knowledge of VLA models while focusing on task-critical objects [9][10]. - The workflow of ControlVLA consists of three main steps: 1. Pre-training a large-scale VLA model on diverse operation datasets to learn conditional distributions from visual and language instructions to action spaces [12]. 2. Extracting object-centric representations from demonstration videos to capture geometric and positional features of relevant objects [12]. 3. Fine-tuning the model using a dual attention mechanism that incorporates object information while preserving the pre-trained strategy [12]. Group 3: Experimental Results - The research team tested ControlVLA on the Astribot S1 robot, demonstrating its ability to efficiently complete both short-term and complex long-term tasks with only 10-20 demonstration data points [14][15]. - In experiments involving eight real-world tasks, ControlVLA achieved an overall success rate of 76.7%, significantly surpassing the traditional method's success rate of 20.8% [15][19]. - For long-sequence tasks, ControlVLA maintained an average success rate of 60%, approximately three times better than existing best methods, showcasing its capability to reduce error accumulation during task execution [19][24]. Group 4: Generalization and Cost Efficiency - ControlVLA demonstrated robust generalization capabilities, maintaining a success rate of 60%-70% when tested with unseen objects and new backgrounds, indicating its adaptability in dynamic environments [24][26]. - The framework allows for substantial reductions in the cost of collecting real operation demonstrations, as evidenced by achieving an 80% success rate in the OrganizeToy task with only 20 demonstration data points, while other methods required 100 data points to reach similar performance [21][26].
从300多篇工作中,看VLA在不同场景下的应用和实现......
具身智能之心· 2025-09-25 04:00
Core Insights - The article discusses the emergence of Vision Language Action (VLA) models, marking a shift in robotics from traditional strategy-based control to a more generalized robotic technology paradigm, enabling active decision-making in complex environments [2][5][20] - It emphasizes the integration of large language models (LLMs) and vision-language models (VLMs) to enhance robotic operations, providing greater flexibility and precision in task execution [6][12] - The survey outlines a clear classification system for VLA methods, categorizing them into autoregressive, diffusion, reinforcement learning, hybrid, and specialized methods, while also addressing the unique contributions and challenges within each category [7][10][22] Group 1: VLA Model Overview - VLA models represent a significant advancement in robotics, allowing for the unification of perception, language understanding, and executable control within a single modeling framework [15][20] - The article categorizes VLA methods into five paradigms: autoregressive, diffusion, reinforcement learning, hybrid, and specialized, detailing their design motivations and core strategies [10][22][23] - The integration of LLMs into VLA systems transforms them from passive input parsers to semantic intermediaries, enhancing their ability to handle long and complex tasks [29][30] Group 2: Applications and Challenges - VLA models have practical applications across various robotic forms, including robotic arms, quadrupeds, humanoid robots, and autonomous vehicles, showcasing their deployment in diverse scenarios [8][20] - The article identifies key challenges in the VLA field, such as data limitations, reasoning speed, and safety concerns, which need to be addressed to accelerate the development of VLA models and general robotic technology [8][19][20] - The reliance on high-quality datasets and simulation platforms is crucial for the effective training and evaluation of VLA models, addressing issues of data scarcity and real-world testing risks [16][19] Group 3: Future Directions - The survey outlines future research directions for VLA, including addressing data limitations, enhancing reasoning speed, and improving safety measures to facilitate the advancement of general embodied intelligence [8][20][21] - It highlights the importance of developing scalable and efficient VLA models that can adapt to various tasks and environments, emphasizing the need for ongoing innovation in this rapidly evolving field [20][39] - The article concludes by underscoring the potential of VLA models to bridge the gap between perception, understanding, and action, positioning them as a key frontier in embodied artificial intelligence [20][21][39]
基于移动设备采集的3DGS实现个性化Real-to-Sim-to-Real导航
具身智能之心· 2025-09-25 00:04
Group 1 - The core issue of embodied AI is the sim-to-real dilemma, where high fidelity in simulation conflicts with cost, leading to challenges in transferring successful strategies from simulation to real-world applications [2] - The potential of 3D Gaussian Splatting (GS) technology has been underexplored, with recent advancements enabling high-fidelity 3D representations from standard devices, addressing the gap between low-cost real scene reconstruction and embodied navigation [3][4] - The proposed method, EmbodiedSplat, consists of a four-stage pipeline that captures real scenes using low-cost mobile devices and reconstructs them in high fidelity for effective training and deployment [4][6] Group 2 - The first stage involves capturing RGB-D data using an iPhone 13 Pro Max and the Polycam app, which simplifies the process and reduces operational barriers [11] - The second stage focuses on mesh reconstruction, utilizing DN-Splatter for 3D GS training, ensuring geometric consistency and minimizing reconstruction errors [11][12] - The third stage includes simulation training with a composite reward function to balance success, path efficiency, and collision avoidance, employing a two-layer LSTM for decision-making [10][13] Group 3 - The fourth stage is real-world deployment, where the Stretch robot connects to a remote server for strategy inference, allowing real-time navigation based on observed images [14][17] - Experiments validate the zero-shot performance of pre-trained strategies in new environments, revealing that scene scale significantly impacts performance [20][22] - Fine-tuning pre-trained strategies leads to substantial performance improvements across various environments, demonstrating the effectiveness of personalized adjustments [25][28] Group 4 - The study highlights the limited success of zero-shot transfer from simulation to real-world scenarios, with significant performance gaps observed [32] - Fine-tuning enhances the transferability of strategies, with success rates increasing significantly after adjustments [32][35] - The necessity of large-scale pre-training is emphasized, as it provides foundational knowledge that aids in adapting to new environments and overcoming real-world challenges [35][44]
ARTS 2025大咖云集|第三届自主机器人研讨会嘉宾公布,相约浙大,开放注册!
具身智能之心· 2025-09-25 00:04
ARTS 2025 为了促进自主机器人领域一线青年学者/工程师的交流,推动学术界与企业界的交融与产学研合作,搭建一个深度、纯粹的技术交流平台,中国自动化学会决定主办自主 机器人技术研讨会(Autonomous Robotic Technology Seminar,简称ARTS)。 ARTS倡导理性批判、敢于质疑、务实的科学精神,积极探索自由平等的思想交锋。ARTS主要关注传感与感知、自主导航、状态估计、移动机器人定位建图、运动规 划、建模与控制、多机器人系统、具身智能、医疗仿生等方向。 第三届ARTS大会将于 2025年10月18日至19日 ,在浙江大学盛大启幕。诚挚邀请您参加,并对大会的组织提供意见和建议! 本届会议特邀超40位青年领军研究者,并设有学术辩论会、学术脱口秀、ARTS奖学金及企业参观等多元日程。 温馨提示:会议名额有限,请尽早报名以免满额后无法注册。 了解线下会议详情 请扫码加入 【 ARTS 2025 交流群】 一、组织机构 主办单位 : 中国自动化学会 承办单位 : 浙江大学控制科学与工程学院 上海交通大学自动化与感知学院 协办单位 : 深蓝学院 二、会议日程 | | 9:20-9:50 ...