具身智能之心
Search documents
转具身最好的机会在昨天,其次是现在...
具身智能之心· 2025-12-01 10:00
Openarm是一款双臂任务框架,目前有几家公司开始生产相关本体,缺乏移动能力,一些叠衣服、pick and place也都能满足。但从数据采集来看,VR版本更舒服。 XLerobot存在一定的移动能力,但不多,适合一些入门科研&个人开发使用,可以适配移动操作的一些任 务。 最近在为大家收敛具身科研的几个重点模块:行业内容、本体形态、算法、还有部署的一些方案,已经汇 总在我们的社区内部。 目前为大家梳理了行业正在从事具身大脑、本体研发的公司(突然发现本体也卷不太动了......),以及一些 比较活跃的具身实验室。方便大家判断和升学,除此之外,还有很多行业的研报,供大家判断具身的发展 与周期。 本体方面,推荐几款适合科研的产品:SO-100系列、openarm系列、XLerobot系列等; SO100及升级版本,能上一些VA和VLA的算法,常见功能可以实现了; 其它开发平台,成本较高,需要一定的资金投入,可以参考方舟无限、星海图、宇树的几款本体。 算法层面,目前我们收拢了关于vla(训练、无需训练方式、vla+RL、vla+世界模型、vla轻量化、部署 等)、vln(时间语言、目标导航、点导航等)、运控(强化、 ...
港理&清华等首个具身程序性综述:让机器人从第一人称视角学习步骤、纠错与问答
具身智能之心· 2025-12-01 10:00
Core Viewpoint - The article presents a comprehensive overview of the concept of an Egocentric Procedural AI Assistant (EgoProceAssist), which aims to assist individuals in performing daily procedural tasks from a first-person perspective. It identifies three core technical tasks necessary for this assistant: Egocentric Procedural Error Detection, Egocentric Procedural Learning, and Egocentric Procedural Question Answering [6][32]. Summary by Sections Motivation - The article emphasizes the prevalence of procedural tasks in daily life, which require a specific sequence of steps to achieve desired outcomes. It highlights the potential of an AI assistant to enhance safety and efficiency in performing these tasks, especially in high-risk scenarios [6][8]. New Classification System - A novel classification system is introduced, categorizing the three core tasks of the AI assistant and summarizing existing methods, datasets, and evaluation metrics relevant to each task [2][6]. Egocentric Procedural Error Detection - This section outlines the existing key technologies for detecting procedural errors from a first-person perspective. It differentiates between methods that require only video data and those that utilize multimodal data, emphasizing the unique challenges of procedural error detection compared to general anomaly detection [9][11][12]. Egocentric Procedural Learning - The article discusses various approaches to procedural learning, categorized by supervision levels: unsupervised, weakly supervised, and self-supervised methods. It highlights the importance of identifying key steps in procedural tasks to improve error detection and planning capabilities [14][16]. Egocentric Procedural Question Answering - This section summarizes current technologies for answering procedural questions from a first-person perspective, noting the challenges posed by occlusions and scene changes. It emphasizes the need for models to possess strong understanding and memory capabilities to effectively respond to user queries [17][20]. Supplementary Experiments - The article presents supplementary experiments that evaluate the performance of existing VLMs and AI agents in procedural error detection and learning tasks. The results indicate significant limitations in their ability to assist with first-person procedural tasks [23][25]. Challenges - The article identifies several challenges in developing the EgoProceAssist, including data scarcity, limited understanding of long-term procedural activities, and heavy reliance on manual annotations, which hinder real-time assistance capabilities [29][30][31]. Conclusion - The research concludes by reiterating the significance of the proposed AI assistant and its core tasks, while also addressing the ongoing challenges and limitations in the field. It aims to provide a foundation for future research directions in egocentric AI applications [32].
带硬件!最全的VLA实战教程来啦
具身智能之心· 2025-12-01 03:12
Core Viewpoint - The article discusses the challenges and advancements in the VLA (Variable Learning Algorithm) field, emphasizing the importance of real machine data collection and the complexities involved in training and deploying VLA models. Group 1: Data Collection - Real machine data collection is crucial for VLA models, with methods including remote operation, VR, and full-body motion capture [2][8] - The effectiveness of data collection methods and ensuring high-quality data are significant challenges, particularly in the context of real-to-sim-to-real transitions [8] Group 2: VLA Training - Training VLA models typically requires simulation debugging before real machine deployment, especially when real machine data is insufficient [10] - Techniques for fine-tuning models and achieving good results with small data sets are critical, as many students struggle with training models effectively [10] Group 3: VLA Model Deployment - After training, VLA models often require "slimming" due to their large parameter sizes, which poses challenges for deployment on edge chips [12] - Lightweight operations such as quantization and distillation are essential to minimize parameter size while maintaining performance [12] Group 4: Educational Initiatives - The article introduces a practical course aimed at helping students effectively learn about VLA, covering hardware, data collection, algorithms, and deployment [14][16] - The course is designed for various audiences, including those seeking jobs in the field, beginners looking to advance, and researchers in embodied intelligence [27]
VLA+RL方案的部署落地如何啦?
具身智能之心· 2025-12-01 03:12
Core Viewpoint - The article discusses the challenges and advancements in deploying VLA (Variable Length Attention) algorithms and Reinforcement Learning (RL) in robotics, focusing on improving performance and efficiency in real-world applications [3][4]. Group 1: VLA Architecture and Model Challenges - The article highlights the existing pain points in the architecture and models of VLA, indicating that there are still significant areas for improvement [4][8]. Group 2: Full-Body Motion Control for Robots - It explores the potential advancements in full-body motion control solutions for robots, emphasizing the need for better performance in tasks such as dancing [4][8]. Group 3: VLA and RL Integration - The discussion includes how to effectively integrate VLA with RL for real machine deployment, addressing the selection of hardware ("板子") and strategies for lightweight implementation [4][8]. Group 4: Expert Contributions - The article features insights from various experts in the field, including representatives from companies and academic institutions, who share their perspectives on the discussed topics [9][11][13]. Group 5: Additional Resources - It mentions that a more in-depth analysis and technical details are available on the "Embodied Intelligence Heart" knowledge platform, which includes exclusive content and Q&A sessions [19].
炸了!ICLR 一键清零 rebuttal,全网研究者怒了
具身智能之心· 2025-12-01 03:12
>> 点击进入→ 具身 智能之心 技术交流群 编辑丨机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 作者丨 正在吃瓜的 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 大家晒出自己收到的ICLR 最新通知。 社交平台上,多位作者晒出邮件截图控诉: 一个被彻底打开的潘多拉盒子。 ICLR 开盒事件 虽然在平台层面已经「补洞」,但后续冲击才刚刚开始。 昨晚起,随着 ICLR 发出最新通知:所有论文的 AC(Area Chair)将被重新分配、所有审稿意见与分数被重置回讨论前状态。 国内外 AI 社区再度炸锅。 有人刚刚做完大规模补实验、写完长 rebuttal,辛辛苦苦说服两位审稿人把分数从 4 拉到 8,结果现在被一键打回初始状态,等于通宵努力全被抹平。 也有人公开质问:这不是典型的牵连无辜吗? 这是对绝大多数无辜作者的「连坐式惩罚」——这些作者既没参与利用漏洞、也按规矩完成了 rebuttal,却要和少数「作恶者」一起被清零。 你们掌握着完整的日志和元数据,完全可以自己筛查可疑行为,为什么要让所有人一起背锅? 还有人担心,如果现在把所有评分 ...
ICRA 2026 | 首个真实世界场景的具身学习挑战赛!最高7万美金奖励
具身智能之心· 2025-12-01 03:12
Core Insights - The REAL-I competition, endorsed by IEEE ICRA, aims to advance research in embodied intelligence and data-driven robotic manipulation [1][5][34] - The competition features a prize pool of $90,000, with opportunities for participants to publish high-value papers based on competition data [1][30][32] Competition Structure - The competition consists of two phases: a Simulation Track for qualifying and a Real-Robot Stage for final evaluation [15][25][26] - Each task in the competition is scored out of 100 points, focusing on real-world industrial tasks such as parcel weighing, parts sorting, and full-cycle plate transporting [15][16][19][21] Event Details - The competition is organized by Lejoin Intelligence and the Beijing Institute of General Artificial Intelligence, in collaboration with top global universities [34] - The on-site event is scheduled for June 1, 2026, with a timeline for the competition phases leading up to this date [27][24]
第一个吃螃蟹的人!上交成立全球首个具身智能专业
具身智能之心· 2025-11-30 07:06
参考丨上海交大、量子位等 点击下方 卡片 ,关注" 具身智能之心 "公众号 首个吃螃蟹的高校来啦!本月,上海交通大学发布公告正式拟增具身智能本科专业(不再是机器人工程等相关专业)。具身是一个交叉学科,涉及的 内容非常多,此时推出具身专业可以看出教育领域已经开始注意到相关产业的高速发展和人才的供不应求。不出意外,具身智能的风很快将吹遍大多 高校。 本专业将隶属于人工智能学院计算机类,授予工学学位,修读年限为四年。预计年度招生人数30人,其中升学人数25人,占比约83%。专业的带头人 将会由卢策吾老师担任,他本人现为上海交通大学人工智能学院副院长。 保姆级 具身智能方向论文辅导来啦! 我们提供的辅导服务 顶会 / 顶刊 / SCI / EI / 中文核心 毕业论文 / 申博辅导 / 比赛辅导等 不会选方向? 无人辅导 OR 科研能力不够? 具身智能之心提供保姆级的论文 辅导服务,一键上岸! >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 添加小助理微信 oooops-life咨询更多 二) 具身智能社区 具身智 ...
华尔街尬捧TPU学术界懵了:何恺明5年前就是TPU编程高手,多新鲜~
具身智能之心· 2025-11-30 03:03
Core Viewpoint - The article discusses the implications of Meta's potential multi-billion dollar TPU order from Google, highlighting the competitive dynamics between Google and Nvidia, and questioning the perceived advantages of both companies in the AI hardware market [1][3][22]. Group 1: Market Reactions - Following the news of Meta's TPU order, Nvidia's stock experienced a significant drop, losing over $300 billion in market value, while Google's stock rose, adding approximately $150 billion in market capitalization [1][2]. - The Wall Street Journal interpreted this as a challenge to Nvidia's market dominance by Google [3]. Group 2: Technical Insights - Industry experts argue that both Google and Nvidia lack a strong competitive moat, with major companies like Meta and OpenAI already utilizing TPUs for their projects [4][11]. - OpenAI has developed Triton to bypass Nvidia's CUDA, achieving performance comparable to cuBLAS with minimal code [12][13]. - Cost analysis shows that Nvidia's H100 chip is significantly more cost-effective than Google's TPU v6e, with a performance ratio of 5:1 in terms of token output per dollar spent [14][15]. Group 3: Strategic Implications - Google's strategy in selling TPUs is not primarily profit-driven but aims to secure production capacity and favorable pricing through long-term contracts with major clients like Meta and Apple [21][22]. - This approach allows Google to leverage its partnerships to ensure chip supply, potentially sidelining smaller chip companies [25][29]. - The article draws parallels to Apple's past strategies in securing display panels, indicating a similar tactic being employed by Google in the TPU market [27][28].
北大新作EvoVLA:大幅降低机器人幻觉,长序列成功率暴涨10%
具身智能之心· 2025-11-30 03:03
编辑丨 新智元 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 【导读】 具身智能的「ChatGPT时刻」还没到,机器人的「幻觉」却先来了?在需要几十步操作的长序列任务中,现有的VLA模型经常「假装在干 活」,误以为任务完成。针对这一痛点,北京大学团队提出自进化VLA框架EvoVLA。该模型利用Gemini生成「硬负样本」进行对比学习,配合几 何探索与长程记忆,在复杂任务基准Discoverse-L上将成功率提升了10.2%,并将幻觉率从38.5%大幅降至14.8%。 具身智能(Embodied AI)正处于爆发前夜。 从谷歌的 RT-X 到开源社区的 OpenVLA,通才机器人策略(Generalist Robot Policies)展现出了惊人的零样本泛化能力。然而,当我们将目光从简单 的「抓取-放置」转向需要数十个步骤的长程操作任务(Long-horizon Manipulation)时,现有的 VLA 模型却暴露出一个尴尬的致命弱点: 它们学会了「作 ...
北京大学最新!MobileVLA-R1:机械臂之外,移动机器人的VLA能力怎么样了?
具身智能之心· 2025-11-30 03:03
Core Insights - The article discusses the introduction of MobileVLA-R1, a new framework for quadruped robots that bridges the gap between high-level semantic reasoning and low-level action control, addressing the challenges of stability and interpretability in existing methods [1][2][21]. Group 1: Need for Reconstruction of VLA Framework - Current quadruped robots face two main challenges: a semantic-control gap leading to instability in command execution and a lack of traceable reasoning that complicates error diagnosis [2]. - MobileVLA-R1's breakthrough lies in decoupling reasoning from action execution, allowing robots to "think clearly" before "acting accurately," enhancing both interpretability and control robustness [2][23]. Group 2: Implementation of MobileVLA-R1 - MobileVLA-R1 employs a structured CoT dataset, a two-stage training paradigm, and multi-modal perception fusion to achieve coherent reasoning, stable control, and strong generalization [4][6]. - The structured CoT dataset includes 18K episode-level samples, 78K step-level samples, and 38K navigation-specific samples, filling the gap in reasoning supervision from instruction to action [4][5]. Group 3: Performance Evaluation - In navigation tasks, MobileVLA-R1 achieved a success rate of 68.3% and 71.5% on R2R-CE and RxR-CE datasets, respectively, outperforming existing methods by an average of 5% [10]. - For quadruped control tasks, it achieved an average success rate of 73% across six locomotion and operation tasks, significantly surpassing baseline models [12][13]. Group 4: Real-World Deployment - MobileVLA-R1 was tested on the Unitree Go2 quadruped robot in various environments, demonstrating robust adaptation to complex scenarios with a success rate of 86%-91% for complex instructions [14][18]. - The integration of depth and point cloud encoders improved navigation success rates by 5.8%, highlighting the importance of 3D spatial information for scene understanding [19][20]. Group 5: Key Conclusions and Future Directions - MobileVLA-R1 innovatively integrates chain-of-thought reasoning with reinforcement learning, addressing the industry's dilemma of either interpretability or execution stability [21][23]. - Future directions include expanding the action space for more precise tasks, reducing reasoning latency through model optimization, and enhancing self-supervised learning to decrease reliance on labeled data [23].