具身智能之心
Search documents
机器人学习现状!PI团队内部员工分享(从数采到VLA再到RL)
具身智能之心· 2025-12-23 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多VLA与RL实战项目,欢迎加入国内首个工业级VLA实战课程 : 具身VLA实战与求职教程来啦~ 。 这次来学习一下 PI 内部人员写的 blog,介绍了很多 robot learning 的现状,而且都是一线的真正经验,很多在一线的同学应该深有感触,说了很多实话,质量很 高,值的精读和学习。不管是对 IL DAgger RL 的看法都是很一手的经验。 接下来请享受这份知识 基本上,目前(2025 年 12 月)所有机器人学习系统都是纯粹的行为克隆(BC,也称模仿学习)系统。人类提供(接近)最优的任务演示,机器学习模型则尝试模 仿这些动作。形式上,策略训练采用监督式方法——给定机器人的状态 (例如摄像头图像、机器人关节角度以及可能的任务描述文本),policy 预测已演示的动作 a 通常是一个动作片段(action chunk),例如接下来约 50Hz 的 1 秒动 作)。 本文档旨在描述现代生物认知技术栈的构成,以及其不足之处和(不完整/笨拙的)变通方 ...
全球灵巧手盘点以及新趋势猜想!
具身智能之心· 2025-12-23 00:03
原文链接 | https://www.zhihu.com/pin/1984008846390355375 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 作者丨 CyberSoma 编辑丨具身智能之心 机器人核心技术?微型化,应该是我最近提到最多的词。WuJI HAND在前段时间刷屏的时候,明显感觉到微型直驱电机的集成化这个趋势,毕竟当前电机体积仍制 约人形机器人的手臂空间适配。 2. 感知技术从单一触觉到多模态智能融合 在这个你认为具身智能是通往AGI(通用人工智能)的必经之路吗?回答中我比对过今年3月和11月底的具身智能圆桌讨论,各位大佬不仅强调多模态感知学习效率 超视觉-语言大模型,还提到了多模态数据的重要性,这点不仅限于人形机器人,灵巧手的感知亟待升级优化。 3. 场景化定制的垂直细分 在2025 年 11 月十大机器人技术进展中,阿姆斯特朗机器人公司计划打造通用厨房机器人,从洗碗功能起步,这就是专注吃掉垂直场景中的客户,当前的灵巧手仍 然是偏通用,需要更加垂直到场景中,家庭服务、工业装配、医疗康复等等方向,把任何一个市场吃精吃透,都能打败无效内卷,毕竟 ...
这款机械臂丝滑跑出了pi0与pi0.5,支持Lerobot框架~
具身智能之心· 2025-12-23 00:03
成功适配Lerobot啦~ 新手也能轻松解锁的精准实操! 继打通pi0与pi0.5任务后,Imeta-Y1轻量级机械臂现已适配Lerobot ,成功 实现夹取方块精准放入胶带圈的流畅操作,配套代码也将正式开源! 想让算法快速落地实战的同学,不妨了解一下我们这款机械臂! 从识别抓取,到稳定搬运,再到对准放置,每一步都见证了算法的持续迭代与机械臂执行表现的稳定性。 让科研更贴近实战,让想法更快得到验证。Imeta-Y1与你一同进步,在具身智能的道路上,走得更稳、更 远。 面向具身科研领域打造的轻量级高性价比机械臂 还在为具身智能领域的硬件选择发愁吗? 太贵的机械臂买不起,太便宜的又难用、难上手? ✅ 24小时快速售后响应,遇到问题不卡壳,学习路上有保障! 该机械臂融合高精度运动控制、低功耗设计与开放软硬件架构,支持从仿真到真机的无缝联调,并提供全 流程开源SDK与工具链,助力用户快速实现算法验证、数据采集、模型训练与部署应用。 其紧凑型结构与模块化接口,尤其适用于嵌入式AI与机器人学习平台的开发与应用推广。 别担心,Imeta-Y1 来了——这是一款专为新手和科研初学者设计的轻量级高性价比机械臂。 无论你是学生、 ...
AAAI 2026重磅!原力无限攻克具身智能“泛化”顽疾,定义因果AI新范式
具身智能之心· 2025-12-23 00:03
如果说"大模型"让机器人学会了说话,那么 "泛化能力"(Generalization) 则是决定机器人能否走出实验室、真正进入千家万户的关键门 槛。 但在当下, 为什么机器人在训练场景里表现完美,一旦换个房间、换个光照、或者换个颜色的杯子,就会突然 " 智障 " ? 根本原因在于, 传统的AI往往只学会了表面的相关性(Correlation),而没有掌握事物背后的因果性(Causality) 。 近日, 全球顶级人工智能会议AAAI 2026正式收录了原力无限与香港大学、澳门大学、武汉大学等联合完成的重磅研究 《 DSAP: Enhancing Generalization in Goal-Conditioned Reinforcement Learning 》 。 该研究由原力无限科研团队深度参与,与顶尖高校学者共同攻坚, 首次提出了一种基于因果图(Causal Graph)的结构感知代理框架 (DSAP) 。 这标志着原力无限成功将因果推理技术引入具身智能大脑,为解决困扰行业的"分布外泛化"(OOD)难题提供了全新的理论支撑。 行业痛点 机器人为什么不懂"举一反三"? 01 如果你把桌子换成蓝色,AI可能 ...
今年大概率产了n篇VLA+RL工作吧?!
具身智能之心· 2025-12-22 10:23
Core Insights - The article emphasizes the integration of Reinforcement Learning (RL) with Vision-Language-Action (VLA) models to enhance their generalization capabilities, particularly in out-of-distribution (OOD) scenarios, where performance improvements can reach up to 42.6% [2]. Group 1: Research Directions - The article suggests that future research should focus on the combination of VLA and RL, encouraging collaboration with research assistants for guidance on starting projects in these areas [3]. - Several notable recent works in VLA+RL have been highlighted, showcasing significant advancements in the field [5][10]. Group 2: Notable Papers and Projects - A list of representative papers from the last two years is provided, including titles such as "NORA-1.5" and "Balancing Signal and Variance," which focus on various aspects of VLA and RL integration [5][10]. - Links to project homepages and paper PDFs are shared for further exploration of these works [6][9][12]. Group 3: Tools and Frameworks - The article mentions the development of tools like Rlinf, which supports a growing number of methods for VLA+RL frameworks, indicating a trend towards more robust and versatile research tools [2][11].
复杂空间推理新SOTA,性能提升55%!中山大学新作SpatialDreamer
具身智能之心· 2025-12-22 01:22
Core Insights - The article discusses the introduction of SpatialDreamer, a framework developed by researchers from Sun Yat-sen University and MBZUAI, which enhances complex spatial task performance through active mental imagery and spatial reasoning [1][4]. Group 1: Limitations of Current Models - Despite significant advancements in multimodal large language models (MLLMs) for scene understanding, their performance remains limited in complex spatial reasoning tasks that require psychological simulation [2]. - Existing methods primarily rely on passive observation of spatial data, lacking the unique human ability for active imagination and dynamic internal representation updates [3]. Group 2: SpatialDreamer Framework - SpatialDreamer simulates human spatial cognition through a closed-loop reasoning process consisting of three steps: exploration, imagination, and reasoning [6]. - The exploration phase involves the model determining optimal self-centered actions based on the current scene, such as "move forward 0.75 meters" or "turn left 45 degrees" [6]. - The imagination phase generates new perspective images after executing actions using a world model [6]. - The reasoning phase integrates all accumulated visual evidence to produce a final answer [6]. Group 3: GeoPO Strategy Optimization - To address the issue of sparse rewards in long-sequence reasoning tasks, the research team introduced GeoPO, a strategy optimization method combining tree sampling structures and geometric consistency constraints [8]. - The tree sampling approach allows multiple action branches at each step, supporting backtracking and multi-path exploration [8]. - A multi-level reward design merges task-level and step-level rewards to provide fine-grained feedback [8]. - A geometric penalty mechanism imposes penalties on redundant or conflicting actions, encouraging efficient trajectory generation [8]. Group 4: Performance Validation - The effectiveness of SpatialDreamer was validated across multiple spatial reasoning benchmarks, achieving state-of-the-art (SOTA) results with an average accuracy of 93.9% and 92.5% on real and synthetic images, respectively, in the SAT benchmark [13]. - In the MindCube-Tiny benchmark, it achieved an overall accuracy of 84.9%, surpassing the baseline Qwen2.5-VL-7B by over 55% [13]. - In the VSI-Bench, it outperformed in tasks such as object counting, relative direction, and path planning, with an average accuracy of 62.2% [13]. Group 5: Significance of SpatialDreamer - The significance of SpatialDreamer lies not only in improving spatial reasoning accuracy but also in demonstrating that MLLMs can enhance reasoning capabilities through "imagination," marking a significant step towards human-like spatial intelligence [14].
超越π0.5,MiVLA通过人机相互模仿预训练,破解 VLA 模型泛化与数据瓶颈
具身智能之心· 2025-12-22 01:22
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Zhenhan Yin等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在机器人视觉 - 语言 - 动作(VLA)模型领域,"数据稀缺" 与 "泛化薄弱" 始终是两大核心痛点——真实机器人数据采集成本高、场景覆盖有限,而模拟数据存在 "模拟 - 现实鸿沟"、人类数据面临形态差异难题,现有方案难以兼顾 "数据规模" 与 "迁移性能"。 由同济大学、电子科技大学等团队联合提出的 MiVLA 模型,以 "人机相互模仿预训练" 为核心创新,首次实现无需真实机器人数据,仅通过模拟机器人数据与人类 视频数据的融合训练,就能达成超越现有顶尖模型的泛化能力,为通用型机器人政策学习提供了低成本、高可扩展的全新路径。 为什么需要重构 VLA 预训练范式? 当前 VLA 模型训练陷入双重困境:一方面,依赖真实机器人数据的训练方案受限于 "数据瓶颈";另一方面,依赖单一模拟数据或人类数据的方案受限于 "模态鸿 ...
自变量王潜:具身智能是物理世界的独立基础模型|MEET2026
具身智能之心· 2025-12-22 01:22
Core Viewpoint - The article discusses the debate on whether embodied intelligence should be viewed as an application or as an independent foundational model, asserting that it is a foundational model specifically designed for the physical world, parallel to language and multimodal models [6][12][60]. Group 1: Differences Between Physical and Virtual Worlds - There is a fundamental difference between the physical world, characterized by randomness and continuous processes, and the virtual world, which is highly reproducible and low in randomness [2][10]. - Existing models based on language and visual modalities are inadequate for accurately representing the complexities and randomness of physical interactions [16][22]. Group 2: Need for a Separate Foundational Model - A separate foundational model for embodied intelligence is necessary due to the unique characteristics of the physical world, which often leads to unpredictable outcomes even under identical conditions [10][11]. - The current architectures and training methods struggle to capture the high randomness present in physical events, necessitating a new approach to model design [12][20]. Group 3: Future of Multimodal Models - Shifting the perspective to view embodied intelligence as an independent foundational model can lead to significant changes in model architecture and data utilization [9][23]. - The learning and perception processes in the physical world differ fundamentally from those in the virtual world, suggesting that future multimodal models should incorporate these differences [24][29]. Group 4: Scaling Laws and Data Utilization - The article emphasizes the importance of scaling laws in the development of large models, particularly in the context of robotics, where data acquisition and utilization are critical [46][51]. - A phased approach to training, utilizing both pre-training and post-training data, is recommended to enhance model performance [48][52]. Group 5: Hardware and AI Integration - The integration of AI in defining hardware is crucial for the development of embodied intelligence, advocating for a simultaneous evolution of both software and hardware [53][54]. - The potential for embodied intelligence to drive exponential growth in resources and capabilities is highlighted, suggesting a transformative impact on the future of artificial general intelligence (AGI) [59][60].
这个近3000人的具身社区近期又分享了很多内容~
具身智能之心· 2025-12-22 01:22
Group 1 - The core viewpoint of the article emphasizes the growth and development in the embodied intelligence sector, highlighting increased financing, production trials, and innovative product designs [2][3][4] - In financing, apart from a few star companies, the number of component companies has increased, and their financing amounts have grown [2] - In production, several companies are beginning pilot projects, with many startups seeking funding backed by orders, while leading humanoid robot companies are exploring industrial-grade product deployment [2] Group 2 - In product design, mechanical arm products are gradually converging, while innovations in structure and size continue in mobile operations and humanoid robots, with companies focusing on cost reduction and supply chain management [2] - The deployment of robots is advancing, with companies like Digua Robotics launching the S600 to support edge-side deployment, and Thor applying its technology in humanoid robots and mobile operations [4] - The computational power of over 2000T is becoming a reference configuration in the industry [4] Group 3 - The community is actively planning research reports and welcomes newcomers interested in the embodied intelligence field, having established various sharing platforms over the past year [7] - The community offers continuous live sharing sessions, roundtable forums, and a comprehensive technical roadmap for beginners [8][13] - It provides valuable industry systems and project proposals for those already engaged in related research [15][16] Group 4 - The community has established a job referral mechanism with multiple embodied companies, facilitating connections between job seekers and employers [18] - Members can access exclusive learning videos and documents, enhancing the learning experience [23] - The community has compiled a wealth of resources, including open-source projects, datasets, and technical learning routes, to support both newcomers and advanced learners [19][30]
和我们一起创造价值!具身智能之心招募编辑、运营和销售的同学啦(实习 or 全职)
具身智能之心· 2025-12-21 10:05
负责公众号、小红书、社群的运营,提升粉丝粘性和关注度。我们希望您有一定的运营能力,对自媒体平台的 玩法有一定认识。 编辑岗位 咨询我们 负责日常公众号平台的内容创作、编辑,我们希望您具备一定的专业基础,在知乎、公众号等平台上具有内容 创作经验。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 销售岗位 具身智能之心是具身领域的优秀技术创作平台,为行业输出了大量的前沿技术、课程、行业概况、融资、产 品、政策等内容。 负责平台课程、硬件等产品的销售推广。我们希望您具备一定的销售基础,对具身用户需求与市场有一定的了 解。 现平台正处于上升期,因业务需求,面向全体粉丝招募编辑、运营、销售岗位,和我们一起继续为领域创造价 值,全职+实习哦(实习除编辑岗位均需线下哦~) 运营岗位 如果您有兴趣和我们一起成长,欢迎添加峰哥微信oooops-life ...