Workflow
具身智能之心
icon
Search documents
具身领域LLM结合强化学习与世界模型工作汇总
具身智能之心· 2025-07-30 00:02
Core Insights - The article discusses recent advancements in embodied intelligence, particularly focusing on the integration of large language models (LLMs) with reinforcement learning and world models for various applications in artificial intelligence [2][3]. Group 1: UniSim and Real-World Simulators - UniSim aims to learn general real-world interactive simulators through generative modeling, revealing that diverse natural datasets can enhance the learning of realistic simulations [3]. - The research demonstrates that high-level visual language strategies and low-level reinforcement learning strategies can be trained in a simulated environment and applied directly to real-world scenarios without additional training [3]. Group 2: Causal World Models - The study from Google DeepMind asserts that robust agents must learn causal models to generalize across varying distributions, providing a clear answer to a long-standing question in the field [5]. Group 3: MAMBA Framework - MAMBA introduces an efficient world model approach for meta-reinforcement learning, achieving up to 15 times improvement in sample efficiency while performing well in high-dimensional tasks [8]. Group 4: EMMA and Multimodal Agents - EMMA leverages LLMs trained in text-based worlds to guide visual world training, resulting in a significant performance boost of 20%-70% in task success rates compared to existing visual language models [10]. Group 5: Text2Reward Framework - The Text2Reward framework allows for the automatic generation and optimization of dense reward functions using LLMs, achieving over 94% success rates in new motion behaviors and enhancing strategy performance through human feedback [13][14]. Group 6: Online Continual Learning - The proposed online continual learning frameworks (Behavior-IL and Environment-IL) enable agents to learn continuously in real-world settings without relying on task boundary information, significantly outperforming existing methods [17][18]. Group 7: AMAGO Framework - AMAGO addresses challenges in generalization and long-term memory in reinforcement learning, demonstrating superior scalability and performance in complex tasks [21]. Group 8: PDDL and Planning with LLMs - The research presents a novel paradigm for task planning using pre-trained LLMs, effectively integrating human feedback and reducing the need for extensive manual corrections in planning tasks [22][23].
室内环境具身智能语义建图研究综述:进展、挑战与未来方向
具身智能之心· 2025-07-30 00:02
更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有 你想要的。 作者: Sonia Raychaudhuri,Angel X. Chang 单位:加拿大西蒙弗雷泽大学 论文标题:Semantic Mapping in Indoor Embodied AI – A Survey on Advances, Challenges, and Future Directions 论文链接:https://arxiv.org/pdf/2501.05750 主要贡献 研究背景 编辑丨 视觉语言导航 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 介绍 全面综述 :提供了室内导航中语义建图方法的全面回顾,涵盖了从传统方法到基于深度学习的最新进展。 分类框架 :提出了基于地图结构(如空间网格、拓扑图、密集几何图和混合图)和语义编码(显式特征与隐 式特征)的分类框架,帮助研究者更好地理解和比较不同方法。 挑战与方向 :识别了当前语义建图领域面临的挑战,如高内存需求、计算效率低下,并提出了未来研究方 向,包括开发开放词汇表、可查询、任 ...
中科院自动化所!视觉-触觉-语言-动作模型方案与数据集制作分享
具身智能之心· 2025-07-30 00:02
Core Viewpoint - The article discusses the development of a Vision-Tactile-Language-Action (VTLA) model aimed at enhancing robot manipulation tasks, particularly in contact-intensive scenarios, by integrating visual and tactile inputs with language instructions [2]. Group 1: Model Development - The VTLA framework addresses the gap in applying visual language models (VLM) to language-conditioned robotic operations, especially beyond visually dominated tasks [2]. - A low-cost multimodal dataset was created in a simulated environment, specifically designed for fingertip insertion tasks, which includes visual-tactile-action-instruction pairs [2]. Group 2: Performance and Results - The VTLA model achieved over 90% success rate on unknown hole types, significantly outperforming traditional imitation learning methods and existing multimodal baselines [2]. - The model's capability was validated through real-world hole axis assembly experiments, demonstrating its superior simulation-to-reality (Sim2Real) transfer ability [2].
智元机器人首席科学家罗剑岚老师专访!具身智能的数采、仿真、场景与工程化
具身智能之心· 2025-07-30 00:02
Core Viewpoint - The interview with Dr. Luo Jianlan emphasizes the importance of real-world data in the development of embodied intelligence, highlighting the challenges and strategies in data collection, model training, and application deployment. Data Discussion - The company collaborates with multiple sensor suppliers focusing on the joint development of visual, tactile, and high-density sensors, while building a cross-platform data collection API for standardized data input [2] - Achieving a high performance rate of 95% for robots in real-world applications remains a significant challenge, particularly in household tasks [2] - The company uses 100% real machine data for training multimodal large models, agreeing with the notion that simulation environments have scalability limitations [2][3] - The cost of collecting real-world data is not the main issue; rather, the lack of standardized mechanisms for data collection is a core challenge [6] - The company acknowledges the data scarcity and performance optimization difficulties in both autonomous driving and robotics, emphasizing the need for high success rates in open environments [7] Evaluation of Embodied Large Models - There is currently no universal benchmark for evaluating embodied intelligence models due to significant differences in software and hardware environments across companies [9] - The evaluation of different large models is primarily based on their technical routes and the challenges they face in the current landscape [9][10] - The company aims to establish a unified real-machine testing platform to facilitate model evaluation across different scenarios [9] Embodied Intelligence Applications and Implementation - The deployment process for robots involves four steps: task modeling, scene migration, scene adaptation, and safety verification, emphasizing the importance of hardware-software collaboration [18] - High success rates are crucial, but challenges in generalization, robustness, and real-time performance must also be addressed [20] - Industrial environments are seen as the most promising for the initial large-scale deployment of embodied intelligence due to their structured nature and clear commercial demands [21] Future Outlook for Embodied Intelligence - The company aims for a "DeepSeek moment," focusing on achieving near 100% success rates and high-speed execution capabilities in future models [24] - The transition to a data-driven paradigm is recognized as a significant shift in the field, moving away from traditional hypothesis-driven approaches [25] - The potential of brain-like architectures is acknowledged, with ongoing exploration to combine computation with physical capabilities for future intelligent systems [26]
具身领域LLM结合强化学习与世界模型工作汇总
具身智能之心· 2025-07-29 06:15
Core Viewpoint - The article discusses recent advancements in the field of embodied intelligence, particularly focusing on the integration of large language models (LLMs) with reinforcement learning and world models, highlighting several notable research papers from 2024 [2][3]. Group 1: UniSim - UniSim aims to learn general real-world interactive simulators through generative modeling, revealing that natural datasets can provide diverse advantages for learning simulators [3]. - The research demonstrates that integrating various datasets allows for the simulation of high-level commands and low-level controls, enabling zero-shot application in real-world scenarios [3]. Group 2: Robust Agents - The study from Google DeepMind asserts that causal reasoning is essential for robust and general AI, concluding that agents capable of satisfying regret bounds must learn approximate causal models [5]. - This finding has significant implications for transfer learning and causal inference [5]. Group 3: MAMBA - MAMBA introduces an efficient world model approach for meta-reinforcement learning, addressing sample efficiency issues prevalent in current methods [8]. - The framework shows a remarkable improvement in sample efficiency, achieving up to 15 times better performance in high-dimensional tasks [8]. Group 4: EMMA - EMMA leverages LLMs trained in text-based worlds to guide the training of visual world agents, enhancing their ability to interact with dynamic environments [10]. - The approach results in a significant success rate improvement of 20%-70% in diverse tasks compared to existing VLM agents [10]. Group 5: Text2Reward - The Text2Reward framework automates the generation of dense reward functions using LLMs, addressing the challenges of reward function design in reinforcement learning [13][14]. - The method demonstrates superior performance in 13 out of 17 tasks, achieving over 94% success in new motion behaviors [14]. Group 6: Online Continual Learning - The research proposes two frameworks for continuous learning in interactive instruction-following agents, emphasizing the need for agents to learn incrementally as they explore their environments [17][18]. - A confidence-aware moving average mechanism is introduced to update parameters without relying on task boundary information [18]. Group 7: AMAGO - AMAGO is a scalable contextual reinforcement learning framework that addresses challenges in generalization, long-term memory, and meta-learning [21]. - The framework allows for parallel training of long-sequence transformers, enhancing scalability and performance in complex tasks [21]. Group 8: PDDL-based Planning - The study presents a novel paradigm for task planning using pre-trained LLMs, focusing on building explicit world models through PDDL [22][23]. - The framework significantly reduces the need for human intervention by allowing LLMs to convert between PDDL and natural language, facilitating efficient model correction [23].
ERMV框架:针对操作任务的数据增强,显著提升VLA模型跨场景成功率
具身智能之心· 2025-07-28 13:19
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Chang Nie等 编辑丨具身智能之心 数学表达: 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 研究背景 机器人模仿学习高度依赖4D多视图序列图像(包含多视角、时间维度的图像),但高质量数据收集成本 高、数量稀缺,严重限制了视觉-语言-动作(VLA)等具身智能策略的泛化与应用。数据增强是缓解数据 稀缺的有效手段,但目前缺乏针对操作任务的4D多视图序列图像编辑方法。 现有方法存在明显的局限:传统数据增强方法(如CACTI、ROSIE)仅针对单张静态图像编辑,无法满足 VLA模型对时空连续4D数据的需求;多视图编辑方法依赖固定相机位置,难以处理机器人操作中动态变化 的多相机系统;视频生成模型因密集时空注意力机制,受限于计算成本,工作窗口小,且难以处理长序列 中的误差累积。 核心挑战与解决方案 ERMV(Editing Robotic Multi-View 4D data)是一种新型数据增强框架,基于单帧 ...
近2000人了!这个具身领域的黄埔军校做了哪些事情?
具身智能之心· 2025-07-28 13:19
Core Viewpoint - The article emphasizes the importance of creating an engaging learning environment through AI and embodied intelligence education, aiming to support students in various fields such as industry, academia, and job searching [1]. Group 1: Community and Resources - The community provides cutting-edge academic content, expert roundtables, open-source code solutions, and timely job information, facilitating a comprehensive learning experience [2]. - The platform has established a job referral mechanism with multiple embodied intelligence companies, allowing members to submit their resumes directly to desired companies [2]. - A collection of over 30 technical routes has been organized to assist beginners in finding benchmarks, reviews, and learning pathways, significantly reducing search time [2][3]. Group 2: Target Audience - For newcomers, the community offers various technical stacks and routes to help them get started in the field [3]. - For those already engaged in related research, valuable industry frameworks and project proposals are provided to enhance their knowledge and skills [5]. Group 3: Community Composition - The community consists of members from renowned universities and leading companies in the field of embodied intelligence, including institutions like Stanford University, Tsinghua University, and companies such as Xiaomi and Fourier Robotics [9]. Group 4: Learning and Development - The community has compiled nearly 40 open-source projects and 60 datasets related to embodied intelligence, along with mainstream simulation platforms and various technical learning routes [9]. - Regular sharing and discussion sessions are held to address common questions and challenges faced by members in their learning journey [11]. Group 5: Benefits of Joining - Members gain access to exclusive learning videos, job recommendations, and opportunities to connect with industry peers, enhancing their professional network [12][14]. - The community provides a supportive environment for members to ask questions and receive guidance on career choices and research directions [69].
AI Lab发布『书生』具身全栈引擎,推动机器人大脑进入量产时代
具身智能之心· 2025-07-28 13:19
Core Viewpoint - Shanghai AI Laboratory has launched the "Intern-Robotics" embodied full-stack engine, addressing key challenges in the embodied intelligence sector and promoting a shift from fragmented development to full-stack mass production [3][4][9]. Group 1: Technological Innovations - Intern-Robotics integrates virtual simulation modeling, real-virtual data connectivity, and training-testing integration, creating a comprehensive solution for the entire chain of embodied intelligence from data collection to application [4][10]. - The engine allows for the development of a single model that can adapt to over 10 types of robotic forms, significantly enhancing the efficiency of model training and deployment across different robot types [6][9]. - Data collection costs have been reduced to 0.06% compared to previous solutions, thanks to the integration of real machine data and virtual synthesized data [6][10]. Group 2: Addressing Industry Challenges - The embodied intelligence field faces three main bottlenecks: lack of unified standards, high data costs, and long R&D cycles. Intern-Robotics provides systematic solutions to these issues [9][10]. - The engine supports six major tasks and over 20 datasets, enabling efficient training and evaluation, thus significantly shortening the development cycle [10][11]. Group 3: Collaborative Initiatives - The "Embodied Intelligence Photosynthesis Plan" has been initiated to empower training centers, robotic companies, and developer communities, fostering innovation and technology breakthroughs [5][20]. - The plan has already attracted 15 organizations, including leading robotics companies, to collaborate on the development and training using Intern-Robotics [5][20]. Group 4: Engine Components - Intern-Robotics consists of three core engines: simulation, data, and training-testing, which together meet the full-stack production needs of embodied intelligence [11][14]. - The simulation engine allows for easy switching of scenarios, robots, and evaluation metrics, significantly lowering the learning curve for developers [13][14]. - The data engine combines physical simulation and generative AI to create high-quality, low-cost data, enhancing the diversity and quality of training datasets [14][15].
找不到合适的公司与岗位?具身智能之心求职交流群来啦!!!
具身智能之心· 2025-07-28 07:14
Group 1 - The article announces the formal operation of a job-seeking community focused on the embodiment industry, responding to requests from fans [1] - The community will primarily discuss topics related to the embodiment industry, including companies, product development, job seeking, and career transitions [1] - The article encourages individuals interested in networking with industry peers and staying updated on the industry to join the community [1]
从今年的WAIC25看具身智能的发展方向!
具身智能之心· 2025-07-28 07:14
Core Insights - The article highlights the development direction of embodied intelligence showcased at the World Artificial Intelligence Conference (WAIC) 2025, with a particular focus on embodied intelligence and autonomous driving, noting a significant increase in the number of participating companies and diverse product forms [1][8]. Group 1: Embodied Intelligence Developments - The event featured various applications of mobile operations, including service and industrial robots, although challenges in cognitive recognition under human intervention were noted [3]. - Companies like Lingxin and Aoyi Technology showcased their dexterous hands, indicating a positive overall shipment performance and standardization of tactile and force control solutions [7]. - Many humanoid robots demonstrated remote control operations, with claims of achieving autonomous navigation and decision-making still lacking stability [8]. Group 2: Industry Trends and Community Engagement - The transition from demo showcases to a more integrated industrial model was observed, with companies focusing on a full-stack process from data to strategy and system deployment, enhancing commercialization efforts [8]. - The article introduces the "Embodied Intelligence Heart Knowledge Planet," a community aimed at facilitating technical exchanges among nearly 200 companies and institutions in the field [10][20]. - The community offers resources such as technical routes, open-source projects, and job sharing, catering to both newcomers and experienced researchers in embodied intelligence [15][19][21]. Group 3: Educational and Research Resources - The community has compiled a comprehensive list of over 30 technical routes and various resources for learning and research in embodied intelligence, including data sets and simulation platforms [21][22]. - Regular discussions and roundtables are organized to address common questions and share insights on the latest advancements in the field [23][24]. Group 4: Job Opportunities and Networking - The community provides job recommendations and networking opportunities, connecting members with industry leaders and potential employers [24][19]. - Members can freely ask questions regarding career choices and research directions, fostering a supportive environment for professional growth [77].