Workflow
具身智能之心
icon
Search documents
这个具身智能领域的黄埔军校,正在做这些事情......
具身智能之心· 2025-09-26 10:42
今年的国庆和中秋赶在一起了,首先提前祝大家节日快乐。也希望每个人都能好好放松下,包括峰哥自己 也真的需要relax下。做了这么久的社区运营和媒体,几乎每天都在线,随时都要处理同学们的问题。 最近在做一个什么事情呢? 主要推进硬件和社区还有很多公司的商务。 前面有很多同学一直向峰哥吐槽硬件贵、不好用。这块我们正在努力找合适的方案,不久就会推给大家。 近期正在推进一些具身产品的测试和开发,期望能给大家提供几个使用还不错的平台。 到时候,会第一时 间在我们的具身智能之心知识星球公开。 还有就是想着完善社区,减少盲区和不完善的体系。体系大,零散,需要投入很多时间整理。所以小长假 前,还要努力一把子,节后给大家呈现更好的内容。 我们也陆续收到了很多高校在具身方向的招生需求,特别是RA、博士、博士后。感兴趣的同学也提前26年 的升学、工作做好准备,和老师熟悉起来,关注我们日常的一些招生信息。正在秋招或者社招的同学,简 历也可以随时砸给我们,第一时间帮大家内推。 所有的内容都会第一时间沉淀到我们的具身社区内,具身智能之心知识星球一致努力打造成为超大的具身 与机器人社区,期望能够在大家最需要帮助的时候解决问题,求职的时候能够 ...
好用,高性价比!面向具身科研领域打造的轻量级机械臂
具身智能之心· 2025-09-26 02:24
面向具身科研领域打造的轻量级高性价比机械臂 还在为具身领域的硬件发愁吗?太贵的硬件买不起,太便宜的机械臂不好用,有没有一款价格低但质量很 高的产品? Imeta-y1来了!低成本可以完成具身领域论文的验证,科研场景的开发,满足大多数从业人员和科研工作者 的需求。 这是一款专为教育、科研与轻工业场景设计的轻量级机械臂。 该机械臂融合高精度运动控制、低功耗设计与开放软硬件架构,支持从仿真到真机的无缝联调,并提供全 流程开源SDK与工具链,助力用户快速实现算法验证、数据采集、模型训练与部署应用。 其紧凑型结构与模块化接口,尤其适用于嵌入式AI与机器人学习平台的开发与应用推广。 | 重量 | 631g | | --- | --- | | 尺寸 | 见附页 | | 行程 | 0~80mm | | 定位精度 | 40.5mm | | 外部接口 | 电源+CAN XT30 2+2 | | 重量 | 671g | | --- | --- | | 尺寸 | 见附页 | | 行程 | 0-80mm | | 定位精度 | 40.5mm | | 外部接口 | 电源+CAN XT30 2+2 | | 本体重量 | 4.2KG | 额定 ...
RoboDexVLM:基于VLM分层架构的通用灵巧机器人操作
具身智能之心· 2025-09-26 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 >>直播和内容获取转到 → 具身智能之心知识星球 点击按钮预约直播 RoboDexVLM是 一个面向配备灵巧手的协作机械臂的创新型机器人任务规划与抓取检测框架。 现有方法通常聚焦于简化且受限的操作任务,往往忽视了以长时序方式抓取多样化物体所伴随的复杂性。相比之下, RoboDexVLM 框架利用灵巧手 能够 抓取不同形状和尺寸物体 的能力,同时基于自然语言指令执行任务。 该方法的核心组成部分如下: 首先,设计了一个具备任务级恢复机制的鲁棒任务规划器 ,它利用视觉语言模型使系统能够解析并执行开放词汇指令以完 成长序列任务。 其次,提出了一种基于机器人运动学和形式化方法的语言引导灵巧抓取感知算法 ,专为面向多样化物体和指令的零样本灵巧操作而设 计。全面的实验结果验证了 RoboDexVLM 在处理长时序场景和执行灵巧抓取方面的有效性、适应性和鲁棒性。这些结果突显了该框架在复杂环境中运行 的能力,展示了其在开放词汇灵巧操作方面的潜力。 论文标题 : RoboDexVLM: Visual Language Model-Enabled Task Planning an ...
VLA这个方向的论文产出,是真的多......
具身智能之心· 2025-09-26 00:04
想象一下,如果能通过语言下达指令,并且丝滑执行任何你想要的动作,是一件多么幸福的事情!如果能长时 间连续动作完成,将会非常方便。下面给大家介绍下VLA到底是啥? VLA打破了传统方法的单任务局限,使得机器人能够在多样化的场景中自主决策,灵活应对未见过的环境, 广泛应用于制造业、物流和家庭服务等领域。此外,VLA模型已成为研究热点,推动了多个前沿项目的发 展,如pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA,这些研究促进了学术界与工业界的合作。其适应性 体现在能够应用于机械臂、四足机器人和人形机器人等多种平台,为各类智能机器人的发展提供了广泛的潜力 和实际应用价值,成为智能机器人领域的关键驱动力。 从今年各个机器人与AI顶会来看,VLA及其相关衍生方向,占据了近一半的具身产出。特别是长程操作、泛 化、少样本、VLA+RL、人形相关。 从产业角度看,国内外具身智能领域正处于蓬勃发展阶段,Unitree、智元、星海图、银河通用、逐际动力等团 队从实验室走向商业化,华为、京东、腾讯等科技巨头也积极布局,与国外Tesla、Figure AI等公司正在一起 推动这一领域的发展。 很多同学后台留言,咨 ...
RoboSeek破解了长时程任务的核心难题,当操作任务遇上 “具身认知”
具身智能之心· 2025-09-26 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 正是在这一技术困境下, RoboSeek 框架 的提出带来了突破性思路,其创新核心源于对 "具身认知理论" 的深度落地 —— 该理论颠覆了 "认知孤立于交互" 的传 统认知,强调智能体的认知能力源于与物体、环境的动态交互。基于这一理念,RoboSeek 构建了 "交互驱动感知与动作联合优化" 的全新架构:通过动态进化的 "注意力空间" 捕捉任务关键信息,用强化学习驱动的具身执行器实现精准动作控制,再借助交叉熵方法迭代优化关键目标,最后通过 "从现实到仿真再到现 实"(real2sim2real)的迁移流程,破解仿真与现实脱节的难题。 这一创新设计的价值不言而喻:在 8 项长时程任务、2 类不同机器人平台的测试中,RoboSeek 实现了 79% 的平均成功率,远超传统基线方法(成功率均低于 50%),不仅为长时程机器人操作提供了稳定可靠的解决方案,更填补了 "具身认知理论" 到 "机器人实际操作" 的落地空白,为通用机器人在真实环境中的应用 开辟了 ...
首个代码世界模型引爆AI圈,能让智能体学会「真推理」,Meta开源
具身智能之心· 2025-09-26 00:04
Core Viewpoint - The article discusses the introduction of the Code World Model (CWM) by Meta, which represents a significant evolution in AI models aimed at improving code generation through world modeling techniques [2][5][31]. Group 1: Model Overview - CWM is a 32 billion parameter open-weight large language model (LLM) designed to enhance code generation research based on world models [7][12]. - It features a dense, decoder-only architecture with a context length of up to 131k tokens, demonstrating strong performance in general programming and mathematical tasks [8][9]. Group 2: Performance Metrics - CWM achieved notable scores in various benchmarks: SWE-bench Verified (pass@1 65.8%), LiveCodeBench (68.6%), Math-500 (96.6%), and AIME 2024 (76.0%) [8][23]. - In comparison to other models, CWM's performance is competitive, particularly in the 30B parameter range [9][30]. Group 3: Training Methodology - The model was trained using extensive observational-action trajectories in a Python interpreter and agent-based Docker environment, focusing on improving code understanding beyond static code training [12][22]. - Meta has made available checkpoints from the mid-training, SFT, and reinforcement learning phases to support further research [13]. Group 4: Research Implications - CWM serves as a robust testing platform to explore the potential of world modeling in enhancing reasoning and planning capabilities in code generation [15][31]. - The research indicates that world models can benefit agent-based coding by allowing for stepwise simulation of Python code execution, which enhances reasoning from such simulations [16][31]. Group 5: Future Directions - Meta envisions that the code world model will bridge the gap between linguistic reasoning and executable semantics, with ongoing research needed to fully leverage its advantages across tasks [31]. - The model aims to improve reinforcement learning by enabling agents familiar with environmental dynamics to focus on learning actions that yield rewards [31].
CoRL 2025最新工作!ControlVLA:机器人看10遍就会,“通智大脑”能力再升级!
具身智能之心· 2025-09-25 09:54
Core Insights - The article discusses the development of ControlVLA, a novel framework that allows robots to learn complex tasks with minimal human demonstrations, achieving a success rate exceeding 75%, which is nearly four times higher than traditional methods [1][10][15]. Group 1: Research Background - Robots face significant challenges in performing tasks in real-world scenarios, especially with limited demonstrations. Existing few-shot learning methods often rely on simulation-enhanced data or pre-built modules, which struggle with the gap between simulation and reality [7][8]. - Recent advancements in Vision-Language-Action (VLA) models show promise in enhancing robot performance across multiple tasks and environments, but adapting these models efficiently to specific tasks in data-scarce situations remains a challenge [8][9]. Group 2: ControlVLA Framework - ControlVLA integrates pre-trained VLA models with object-centric representations to facilitate efficient few-shot fine-tuning for robot operation tasks. The framework employs a ControlNet-style architecture to maintain the rich prior knowledge of VLA models while focusing on task-critical objects [9][10]. - The workflow of ControlVLA consists of three main steps: 1. Pre-training a large-scale VLA model on diverse operation datasets to learn conditional distributions from visual and language instructions to action spaces [12]. 2. Extracting object-centric representations from demonstration videos to capture geometric and positional features of relevant objects [12]. 3. Fine-tuning the model using a dual attention mechanism that incorporates object information while preserving the pre-trained strategy [12]. Group 3: Experimental Results - The research team tested ControlVLA on the Astribot S1 robot, demonstrating its ability to efficiently complete both short-term and complex long-term tasks with only 10-20 demonstration data points [14][15]. - In experiments involving eight real-world tasks, ControlVLA achieved an overall success rate of 76.7%, significantly surpassing the traditional method's success rate of 20.8% [15][19]. - For long-sequence tasks, ControlVLA maintained an average success rate of 60%, approximately three times better than existing best methods, showcasing its capability to reduce error accumulation during task execution [19][24]. Group 4: Generalization and Cost Efficiency - ControlVLA demonstrated robust generalization capabilities, maintaining a success rate of 60%-70% when tested with unseen objects and new backgrounds, indicating its adaptability in dynamic environments [24][26]. - The framework allows for substantial reductions in the cost of collecting real operation demonstrations, as evidenced by achieving an 80% success rate in the OrganizeToy task with only 20 demonstration data points, while other methods required 100 data points to reach similar performance [21][26].
从300多篇工作中,看VLA在不同场景下的应用和实现......
具身智能之心· 2025-09-25 04:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 兰州大学、中科院、新加坡国立等单位联合出品的一篇最新survey! Pure Vision Language Action (VLA) Models: A Comprehensive Survey 论文链接:https://arxiv.org/pdf/2509.19012 视觉-语言-动作(Vision Language Action, VLA)模型的出现,标志着机器人技术从传统基于策略的控制向通用机器人技术的范式转变,同时也将视觉- 语言模型(Vision Language Models, VLMs)从被动的序列生成器重新定位为在复杂、动态环境中执行操作与决策的主动智能体。 机器人技术长期以来一直是科学研究的重要领域。在历史发展进程中,机器人主要依赖预编程指令和设计好的控制策略来完成任务分解与执行。这些 方法通常应用于简单、重复性的任务,例如工厂 ...
基于移动设备采集的3DGS实现个性化Real-to-Sim-to-Real导航
具身智能之心· 2025-09-25 00:04
Group 1 - The core issue of embodied AI is the sim-to-real dilemma, where high fidelity in simulation conflicts with cost, leading to challenges in transferring successful strategies from simulation to real-world applications [2] - The potential of 3D Gaussian Splatting (GS) technology has been underexplored, with recent advancements enabling high-fidelity 3D representations from standard devices, addressing the gap between low-cost real scene reconstruction and embodied navigation [3][4] - The proposed method, EmbodiedSplat, consists of a four-stage pipeline that captures real scenes using low-cost mobile devices and reconstructs them in high fidelity for effective training and deployment [4][6] Group 2 - The first stage involves capturing RGB-D data using an iPhone 13 Pro Max and the Polycam app, which simplifies the process and reduces operational barriers [11] - The second stage focuses on mesh reconstruction, utilizing DN-Splatter for 3D GS training, ensuring geometric consistency and minimizing reconstruction errors [11][12] - The third stage includes simulation training with a composite reward function to balance success, path efficiency, and collision avoidance, employing a two-layer LSTM for decision-making [10][13] Group 3 - The fourth stage is real-world deployment, where the Stretch robot connects to a remote server for strategy inference, allowing real-time navigation based on observed images [14][17] - Experiments validate the zero-shot performance of pre-trained strategies in new environments, revealing that scene scale significantly impacts performance [20][22] - Fine-tuning pre-trained strategies leads to substantial performance improvements across various environments, demonstrating the effectiveness of personalized adjustments [25][28] Group 4 - The study highlights the limited success of zero-shot transfer from simulation to real-world scenarios, with significant performance gaps observed [32] - Fine-tuning enhances the transferability of strategies, with success rates increasing significantly after adjustments [32][35] - The necessity of large-scale pre-training is emphasized, as it provides foundational knowledge that aids in adapting to new environments and overcoming real-world challenges [35][44]
ARTS 2025大咖云集|第三届自主机器人研讨会嘉宾公布,相约浙大,开放注册!
具身智能之心· 2025-09-25 00:04
ARTS 2025 为了促进自主机器人领域一线青年学者/工程师的交流,推动学术界与企业界的交融与产学研合作,搭建一个深度、纯粹的技术交流平台,中国自动化学会决定主办自主 机器人技术研讨会(Autonomous Robotic Technology Seminar,简称ARTS)。 ARTS倡导理性批判、敢于质疑、务实的科学精神,积极探索自由平等的思想交锋。ARTS主要关注传感与感知、自主导航、状态估计、移动机器人定位建图、运动规 划、建模与控制、多机器人系统、具身智能、医疗仿生等方向。 第三届ARTS大会将于 2025年10月18日至19日 ,在浙江大学盛大启幕。诚挚邀请您参加,并对大会的组织提供意见和建议! 本届会议特邀超40位青年领军研究者,并设有学术辩论会、学术脱口秀、ARTS奖学金及企业参观等多元日程。 温馨提示:会议名额有限,请尽早报名以免满额后无法注册。 了解线下会议详情 请扫码加入 【 ARTS 2025 交流群】 一、组织机构 主办单位 : 中国自动化学会 承办单位 : 浙江大学控制科学与工程学院 上海交通大学自动化与感知学院 协办单位 : 深蓝学院 二、会议日程 | | 9:20-9:50 ...