具身智能之心
Search documents
首个开源扩散VLA:Unified DVLA!实现SOTA性能+4倍加速
具身智能之心· 2025-11-07 00:05
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Jiayi Chen等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 Diffusion Large Language Model (DLLM)是大模型圈近期最火的topic之一,对于VLA来说,我们的motivation是充分利用dllm在生成理解一体化方面天然的优势, 将未来帧生成和动作预测统一在一个框架内。 对于生成理解一体化的Unified VLA模型,我们关注的核心问题是如何实现图像生成和动作预测的相互裨益,针对这个问题我们提出了联合离散去噪过程Joint Discrete Denoising Diffusion Process (JD3P),即我们将不同模态的去噪过程统一在同一个去噪轨迹中,通过hybrid attention让动作在去噪过程内持续受益于图像的 去噪过程。为了在推理阶段充分发挥dllm的优势,我们设计了前缀KV Cache和基于置信度的decoding机制, ...
清北推出Motion Transfer,机器人直接从人类数据中端到端学习技能
具身智能之心· 2025-11-07 00:05
作者丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 本文的作者来自清华大学、北京大学、武汉大学和上海交通大学,主要作者为清华大学硕士生袁承博、武汉大学本科生周睿和北京大学博士生刘梦真,通讯作者 为清华大学交叉信息研究院的高阳助理教授。 近期,Google DeepMind 发布新一代具身大模型 Gemini Robotics 1.5,其核心亮点之一便是被称为 Motion Transfer Mechanism(MT)的端到端动作迁移算法 —— 无需重新训练,即可把不同形态机器人的技能「搬」到自己身上。不过,官方技术报告对此仅一笔带过,细节成谜。 正当业内还在揣摩 MT 的「庐山真面目」时, 清华、北大等高校联合团队率先把同类思路推到更高维度:直接把「动作迁移」做到人类 VR 数据上! 更难得的是,他们第一时间放出完整技术报告、训练代码与权重,全部开源可复现。下面带你快速拆解这项「人类→机器人」零样本动作迁移新范式。 什么是 MotionT ...
从转型和研究来看,什么方向更适合第一篇论文?
具身智能之心· 2025-11-06 11:47
Group 1 - The article discusses suitable research directions for publishing papers, particularly in the fields of embodied intelligence, including vln, vla, reinforcement learning, and real2sim2real [1] - For researchers currently engaged in SLAM, vln and vla are recommended as good entry points, especially for those with robotic arms [1] - The article emphasizes the importance of having a good idea for research, noting that new researchers may need to navigate various challenges to develop innovative concepts [1] Group 2 - A new paper guidance service has been launched, offering customized one-on-one mentoring in various advanced topics such as multimodal large models, VLA, reinforcement learning, and more [2] - The mentoring team consists of PhD holders and researchers from top universities and companies, providing comprehensive support from topic selection to publication strategy [2] - The service aims to bridge the gap between academia and industry, focusing not only on paper publication but also on practical application value [3] Group 3 - The article promotes a free matching service for the first ten inquiries, allowing students to have in-depth meetings with mentors based on their research direction and academic background [5]
创办了一个具身论文复现的交流群
具身智能之心· 2025-11-06 11:47
Group 1 - The company has established a technical communication group focused on replicating open-source projects such as vla, vln, and dp, addressing issues related to performance metrics and data collection [1] - The aim of the group is to create a platform for users to share experiences and avoid common pitfalls in the replication process [1] Group 2 - Interested individuals can join the group by adding a designated assistant on WeChat, providing their name and a note indicating their interest in replication [2]
需要撕衣验证?全网都吵疯了!小鹏的人形机器人,是不是真人
具身智能之心· 2025-11-06 05:28
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨机器之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 物理 AI,已经能让人产生错觉了? 这是机器人还是真人? 从昨天到今天,全球大半个互联网都在讨论小鹏的人形机器人 IRON。 大家的「福尔摩斯」本能瞬间觉醒。 小红书网友热议,发布会上步态演示的机器人,其实是真人 + 皮套。 不过,面对铺天盖地的讨论,小鹏似乎一点也不慌。在一个网友评论:「100% 真人在里面」的下面, 何小鹏回应道:「感谢认可。」 IRON 身高约 1.78 米,比 1X 的 NEO 等公司的机器人更高些,体重达到 70kg。 双手使用微型谐波关节,拥有 22 个自由度(DOF),仅比人类少 5 个。这意味着它已经能完成一些 复杂的日常动作 —— 比如叠衣服、擦桌子、整理物品等精细任务。 全身共有 65 个自由度,具备类人脊柱运动能力,比 NEO 还要多出 10 个。 11 月 6 日,小鹏汽车在广州新总部举行 AI Day 202 ...
小鹏AI Day昨日发布 | 颜值、算法、算力均拉满!“IRON:最拟人的人形机器人来了?!”
具身智能之心· 2025-11-06 03:27
内部机械感也是十足! 整体面部还是3D曲面大屏,没有仿生脸,看着还不算"恐怖"。 作者丨具身智能之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 就在昨天,小鹏发布全新一代人形机器人IRON,是一个面向真实世界场景、易泛化、数据容易获取 的平台。 IRON配置如何? 1)电池 首个全固态电池,重量降低30%,续航提升30%; 2)芯片 站、坐、蹲、躺、爬样样精通,柔软材料作为皮肤,仿人类。当机器人拥有了类人的骨骼、肌肉、 皮肤,想想空间会大很多。 单手22个自由度,灵巧手也非常给力。 小鹏的人形机器人布局 小鹏在发展自驾的同时,也一直在布局机器人场景,期望能够建立"车-Robotaxi-人形机器人"的生 态。在2024年的时候,IRON就已经亮相,今年则是全新一代。 搭载3颗图灵AI芯片,2250TFLOPs算力; 3)物理世界大模型 VLT+VLA+VLM高阶大小脑能力组合! 小鹏定于2026年实现量产,重点突破家居、工业场景。 更多内容 更多产业信息,欢迎加入具身智能之心知识星球获取,国内最大的具身全栈社区,和近2000名同 学,200家具身公司与机构一 ...
都在研究具身,但相当一部分同学卡在了这些地方.......
具身智能之心· 2025-11-06 00:03
老学员续费优惠也来了! 都在研究具身,但相当一部分同学卡在了这些地方....... 昨晚开了一个小范围的线上会,和大家唠了一会儿近期的状态。一些同学能抓到关键的部分,跟着社区里 面的路线进步较快。即使用低成本的硬件方案,也能做出不错的效果,有的同学甚至已经把act和pi0部署上 去了。 但还有相当多的同学卡住了,比如算力的问题,数据采集的问题,还有模型优化、项目实战的问题等。关 于算力,前面分享过很多轻量化的方法,也能做出不错的性能,甚至SOTA,这能够适配一些算力不足的同 学。 数据采集部分,建议大家先从基础的遥操作尝试,重点关注数据的质量,噪声数据,可能导致模型训不出 效果,特别是大多数数据都是噪声数据。数据量不够,可以尝试real2sim2real系列方法。 模型优化部分,对一些使用机械臂的同学,可以尝试RL+VLA方案,但人形和自由度多的本体,建议不要 轻易入坑,效果难做出。关于一些好的开源项目,已经汇总到社区内部,大家可以照着教程复现。 以上为我们的具身社区: 具身智能之心知识星球 的分享,也欢迎更多需要入门进阶的同学加入我们的社 区。近一年的搭建,社区内已经完成了技术路线分享、直播、问答、求职、赛 ...
智源具身框架Thor开源:迈向类人级全身控制,在强对抗中“站稳脚跟”
具身智能之心· 2025-11-06 00:03
Core Viewpoint - The article discusses the development of the BAAI Thor framework, which aims to enhance humanoid robots' ability to perform complex physical interactions in real-world environments, achieving human-level whole-body reactions and dynamic stability [7][8][31]. Group 1: Challenges in Humanoid Robot Control - Humanoid robots face two main challenges in transitioning from performers to laborers: the lack of human-like reaction mechanisms and the complexity of high-dimensional coordination control [9]. - The absence of effective human-like reaction mechanisms limits robots' performance under high external forces, as they often rely on rigid resistance strategies that can lead to instability [9][10]. - The high-dimensional nature of the control problem complicates the optimization of control strategies, as it involves numerous degrees of freedom and strong coupling between joints, making learning and adaptation difficult [10][11]. Group 2: BAAI Thor Framework - The BAAI Thor framework integrates biomechanical principles with innovative network structures to enable humanoid robots to achieve coordinated and stable responses in high-intensity force interactions [8][12]. - The framework includes two core components: the Force Adaptive Trunk Tilt Reward (FAT2), which guides robots to adjust their posture based on external forces, and a decoupled network structure that addresses high-dimensional coordination challenges [13][17]. Group 3: Experimental Validation - The BAAI Thor framework was tested on the Yushu G1 robot, which successfully pulled a car weighing approximately 1400 kg, demonstrating its capability for whole-body coordination and dynamic balance under extreme loads [18][20]. - Thor outperformed various baseline algorithms in force interaction tasks, achieving a peak pulling force of 167.7 N, which is about 48% of the robot's weight, representing a 68.9% performance improvement over the best baseline method [26][30]. - Quantitative analysis indicated that the introduction of the FAT2 reward function significantly enhanced the robot's adaptive posture adjustment capabilities, contributing approximately 80%-90% of the performance gains [30].
北大&智源研究院最新!RoboOS-NeXT:“记忆 + 分层架构” 实现通用多机器人协作
具身智能之心· 2025-11-06 00:03
Core Insights - The article discusses the RoboOS-NeXT framework, which addresses the challenges in multi-robot collaboration by integrating a unified memory system and a hierarchical architecture for effective task execution and fault tolerance [1][4][23]. Group 1: Challenges in Multi-Robot Collaboration - Current multi-robot collaboration faces a "triple dilemma": reliance on single-robot memory, difficulty in adapting to heterogeneous robots, and lack of fault recovery capabilities [2][3]. - Existing solutions either fail to accumulate long-term experience or struggle with dynamic task allocation and fault tolerance [2][3]. Group 2: RoboOS-NeXT Framework - RoboOS-NeXT employs a "spatio-temporal entity unified memory (STEM)" and a "brain-cerebellum architecture" to facilitate global memory sharing and dynamic task execution [3][4]. - The framework consists of two core components: STEM for information integration and the brain-cerebellum model for planning and execution [4][9]. Group 3: Core Components of RoboOS-NeXT - **STEM** integrates spatial, temporal, and entity memories, providing a unified interface for all robots and eliminating information silos [6][7][8]. - **Brain-Cerebellum Architecture** separates global planning from local execution, ensuring efficient task decomposition and precise action control [9][10]. Group 4: Execution Workflow - The execution process involves four steps: task decomposition, dynamic scheduling, distributed execution, and dynamic memory updating [10][12]. - This workflow ensures that tasks are efficiently completed, even in the face of robot failures or tool malfunctions [10][12]. Group 5: Experimental Results - RoboOS-NeXT demonstrated superior performance in various scenarios, showing strong lifelong adaptability, collaboration scalability, and fault recovery capabilities [13][14][15]. - In adaptability tests, RoboOS-NeXT maintained a success rate of over 75% in long-sequence tasks, while the baseline without memory failed completely [13][14]. - The framework also showed significant improvements in execution efficiency, with average execution steps per task reduced by 20%-70% compared to the baseline [17][18]. Group 6: Key Conclusions and Future Directions - The unified memory is essential for collaboration, enabling lifelong adaptability and robust scheduling [23][25]. - Future enhancements may include multi-modal memory integration, end-to-end task optimization, and real-time performance improvements [25][26].
多任务、全场景、跨本体通用移动:银河通用发布环视导航基座大模型
具身智能之心· 2025-11-06 00:03
Core Viewpoint - The article discusses the advancements in navigation models for robots, particularly focusing on the launch of the NavFoM (Navigation Foundation Model) by Galaxy General Robotics, which represents a significant leap in the capabilities of robotic navigation systems, allowing for more autonomous and adaptable robots in various environments [3][9][27]. Group 1: Technological Advancements - The NavFoM is the world's first cross-entity panoramic navigation foundation model, unifying various navigation tasks such as Vision-and-Language Navigation, Object-goal Navigation, Visual Tracking, and Autonomous Driving into a single framework [3][9]. - NavFoM allows robots to autonomously perceive their environment and make navigation decisions in unknown settings, moving beyond simple following tasks [9][10]. - The model employs a unified learning paradigm that enables knowledge sharing across different tasks and robot forms, enhancing the efficiency of training and application [13][14]. Group 2: Key Features - NavFoM supports both indoor and outdoor scenarios, operates in zero-shot conditions without the need for mapping or additional training data, and can adapt to various robot types, including quadrupeds, wheeled humanoids, drones, and cars [11][12]. - The model incorporates two key innovations: TVI Tokens for understanding time and direction, and BATS strategy for efficient sampling of video data, allowing for real-time responses while conserving computational resources [17][19]. - The training dataset for NavFoM includes over 8 million cross-task navigation data points and 4 million open-ended question-answer pairs, significantly enhancing its learning capabilities [21][23]. Group 3: Application and Impact - NavFoM has demonstrated state-of-the-art performance in various international benchmarks, showcasing its ability to generalize across tasks and environments without the need for task-specific fine-tuning [25]. - The model has successfully driven various robot forms to execute complex tasks, marking a significant step towards the realization of embodied intelligence in navigation systems [25][27]. - The introduction of NavFoM is seen as a foundational element for a comprehensive navigation system that can support a wide range of applications, from indoor navigation to urban environments, effectively transforming robotic capabilities [29][30].