Workflow
世界模型
icon
Search documents
具身领域LLM结合强化学习与世界模型工作汇总
具身智能之心· 2025-07-30 00:02
Core Insights - The article discusses recent advancements in embodied intelligence, particularly focusing on the integration of large language models (LLMs) with reinforcement learning and world models for various applications in artificial intelligence [2][3]. Group 1: UniSim and Real-World Simulators - UniSim aims to learn general real-world interactive simulators through generative modeling, revealing that diverse natural datasets can enhance the learning of realistic simulations [3]. - The research demonstrates that high-level visual language strategies and low-level reinforcement learning strategies can be trained in a simulated environment and applied directly to real-world scenarios without additional training [3]. Group 2: Causal World Models - The study from Google DeepMind asserts that robust agents must learn causal models to generalize across varying distributions, providing a clear answer to a long-standing question in the field [5]. Group 3: MAMBA Framework - MAMBA introduces an efficient world model approach for meta-reinforcement learning, achieving up to 15 times improvement in sample efficiency while performing well in high-dimensional tasks [8]. Group 4: EMMA and Multimodal Agents - EMMA leverages LLMs trained in text-based worlds to guide visual world training, resulting in a significant performance boost of 20%-70% in task success rates compared to existing visual language models [10]. Group 5: Text2Reward Framework - The Text2Reward framework allows for the automatic generation and optimization of dense reward functions using LLMs, achieving over 94% success rates in new motion behaviors and enhancing strategy performance through human feedback [13][14]. Group 6: Online Continual Learning - The proposed online continual learning frameworks (Behavior-IL and Environment-IL) enable agents to learn continuously in real-world settings without relying on task boundary information, significantly outperforming existing methods [17][18]. Group 7: AMAGO Framework - AMAGO addresses challenges in generalization and long-term memory in reinforcement learning, demonstrating superior scalability and performance in complex tasks [21]. Group 8: PDDL and Planning with LLMs - The research presents a novel paradigm for task planning using pre-trained LLMs, effectively integrating human feedback and reducing the need for extensive manual corrections in planning tasks [22][23].
智元机器人首席科学家罗剑岚老师专访!具身智能的数采、仿真、场景与工程化
具身智能之心· 2025-07-30 00:02
Core Viewpoint - The interview with Dr. Luo Jianlan emphasizes the importance of real-world data in the development of embodied intelligence, highlighting the challenges and strategies in data collection, model training, and application deployment. Data Discussion - The company collaborates with multiple sensor suppliers focusing on the joint development of visual, tactile, and high-density sensors, while building a cross-platform data collection API for standardized data input [2] - Achieving a high performance rate of 95% for robots in real-world applications remains a significant challenge, particularly in household tasks [2] - The company uses 100% real machine data for training multimodal large models, agreeing with the notion that simulation environments have scalability limitations [2][3] - The cost of collecting real-world data is not the main issue; rather, the lack of standardized mechanisms for data collection is a core challenge [6] - The company acknowledges the data scarcity and performance optimization difficulties in both autonomous driving and robotics, emphasizing the need for high success rates in open environments [7] Evaluation of Embodied Large Models - There is currently no universal benchmark for evaluating embodied intelligence models due to significant differences in software and hardware environments across companies [9] - The evaluation of different large models is primarily based on their technical routes and the challenges they face in the current landscape [9][10] - The company aims to establish a unified real-machine testing platform to facilitate model evaluation across different scenarios [9] Embodied Intelligence Applications and Implementation - The deployment process for robots involves four steps: task modeling, scene migration, scene adaptation, and safety verification, emphasizing the importance of hardware-software collaboration [18] - High success rates are crucial, but challenges in generalization, robustness, and real-time performance must also be addressed [20] - Industrial environments are seen as the most promising for the initial large-scale deployment of embodied intelligence due to their structured nature and clear commercial demands [21] Future Outlook for Embodied Intelligence - The company aims for a "DeepSeek moment," focusing on achieving near 100% success rates and high-speed execution capabilities in future models [24] - The transition to a data-driven paradigm is recognized as a significant shift in the field, moving away from traditional hypothesis-driven approaches [25] - The potential of brain-like architectures is acknowledged, with ongoing exploration to combine computation with physical capabilities for future intelligent systems [26]
对话智元具身业务部总裁姚卯青:下半年密集交卷,今年出货几千台
硬AI· 2025-07-29 15:50
Core Viewpoint - The embodied intelligence industry is transitioning from demonstration to practical application, with the second half of the year being a critical period for delivering results [1] Group 1: Company Strategy and Market Position - Zhiyuan has secured a contract with China Mobile for 78 million, indicating strong market demand for its humanoid robots in service sectors [2] - The company aims to provide an integrated hardware and software experience, similar to Apple, rather than an open interface model like Android [10] - Zhiyuan's focus is on real-world data collection, emphasizing that synthetic data cannot fully capture the complexities of physical interactions [6] Group 2: Product Development and Supply Chain - The company anticipates several thousand units to be shipped this year, but faces challenges in the supply chain, particularly with core components like joints and reducers [4] - Zhiyuan is committed to a fully self-developed approach, integrating body, brain, and cognitive functions to create a closed-loop system for product development [5] - The company is exploring both open and real-world scenario data collection to enhance its data diversity and quality [7] Group 3: Market Trends and Future Outlook - The second half of the year is viewed as a window of opportunity for embodied intelligence, with expectations for significant market validation [2] - The company sees a vast potential market for embodied intelligence applications, predicting that specialized companies will emerge in various segments [2][10] - Zhiyuan is also considering entering the quadruped robot market to diversify its product offerings and better understand market needs [13] Group 4: Cost Management and ROI - The company believes that as industrial applications expand, manufacturing costs will decrease, making products more acceptable to clients [11] - Zhiyuan focuses on achieving a reasonable return on investment (ROI) rather than merely reducing costs [11] Group 5: Competitive Landscape - The entry of automotive companies into the embodied intelligence space is seen as a natural progression, but Zhiyuan remains focused on its core business [10] - The company acknowledges that while automotive firms have advantages in supply chain and management, the market for embodied intelligence is significantly larger than that for electric vehicles [10]
WAIC上的“最强大脑”对话:机器人如何走向真实世界?
Nan Fang Du Shi Bao· 2025-07-29 14:46
Core Insights - The 2025 World Artificial Intelligence Conference (WAIC) showcased advancements in robotics, highlighting a future of human-robot coexistence with various applications from labor to companionship [2][12] - A significant forum titled "New Opportunities for Embodied Intelligence from a Global Perspective" featured leading experts from academia and industry discussing the intersection of research and practical applications in embodied intelligence [4] Group 1: Technological Developments - Yao Maoqing from Zhiyuan Robotics introduced a comprehensive framework for embodied intelligence, integrating robot body, motion intelligence, interaction intelligence, and operational intelligence into a closed-loop system [5][7] - Zhiyuan Robotics launched the "Genie Envisioner" platform, an open-source world model for dual-arm robots, which provides an end-to-end solution for robots to transition from perception to action [7][10] - The company also unveiled the first universal embodied base model, "Qiyuan," capable of adapting to various heterogeneous robot bodies [9] Group 2: Practical Applications - Zhiyuan Robotics has successfully implemented its technology in four key areas: industrial manufacturing, warehousing logistics, power inspection, and interactive guidance, demonstrating significant advancements in operational efficiency [10] - Robots equipped with the GE-Act platform have completed tasks such as making sandwiches and using microwaves with success rates exceeding industry averages, showcasing unprecedented precision and robustness [10] Group 3: Academic Contributions - Leading academics discussed breakthroughs in embodied intelligence, with Google scientist Stefan Schaal addressing challenges in high-performance AI-driven robotic operations [11] - Tsinghua University researcher Su Hang elaborated on how base models can facilitate the transition of robots from virtual training to real-world applications [11] - The roundtable discussion featured insights from various experts on the opportunities and challenges in the transition from automation to intelligence in the industry [11]
辅助驾驶有效数据难采集?首个已量产、可交互世界模型来了
Nan Fang Du Shi Bao· 2025-07-29 13:59
Core Insights - The core issue in end-to-end autonomous driving is the need for massive data collection and the ability to cover high-risk scenarios, which poses a data bottleneck for training models [2][4] Group 1: Company Developments - SenseTime's "Jueying Kaiwu" world model is the first interactive generative world model product platform in the assisted driving sector, aimed at addressing data collection challenges [4] - The platform can generate millions of scene data and create real-time interactive training environments, significantly enhancing the assisted driving industry [4] - The efficiency of the "Jueying Kaiwu" model is notable, as it can generate data equivalent to that collected by 500 mass-produced vehicles using just one A100 GPU [4] Group 2: Industry Challenges - The lack of training data is a significant barrier to the widespread adoption of intelligent robots, with leading companies only producing limited real-world data [5] - The growth of visual data generation is lagging behind computational power, leading to a mismatch in model data requirements [5][6] - The need for a large-scale 4D spatial reconstruction capability is essential for creating realistic training scenarios, including high-risk collision scenarios [7] Group 3: Future Implications - The introduction of world models can enable autonomous evolution of driving behavior by simulating various real-world changes and generating multi-modal data [7] - The relationship between humans and AI must be carefully managed to maintain human uniqueness in the era of human-machine coexistence [8][9] - Defining rules and values for AI interactions is crucial for ensuring that robots develop intelligence within acceptable boundaries [9]
商汤首度发布“悟能”具身智能平台
Group 1 - Sense, navigation, and interaction are the three core capabilities of embodied intelligence, with Sense being the foundation for machines to explore the real world [2] - The "Wuneng" embodied intelligence platform by SenseTime integrates advanced visual AI technology to provide recognition and understanding capabilities for various hardware terminals [2] - Navigation is described as the "skeleton" for machines to act in the real world, with SenseTime's technology enabling precise path planning and navigation for robots and other devices [2] Group 2 - The "Wuneng" platform allows robots to interact with the real world, showcasing capabilities such as warmth, depth, long memory, and stability [2] - SenseTime, in collaboration with various domestic partners, launched the "SenseTime Computing Power Mall" to provide flexible and autonomous domestic computing power options [3] - The "SenseTime Computing Power Mall" aims to lower the barriers for AI applications and promote the independent and controllable development of China's AI industry [3]
LeCun出手,造出视频世界模型,挑战英伟达COSMOS
机器之心· 2025-07-29 09:58
Core Viewpoint - The article discusses the development and advantages of a new video world model called DINO-world, which aims to improve the efficiency and effectiveness of predicting future frames in various environments, particularly in the context of artificial intelligence and machine learning [9][10]. Data Challenges - The acquisition of large-scale, high-quality video datasets is costly, especially when action annotations are required. Current successful applications of world models are limited to specific fields like autonomous driving and video games [5]. - Accurately modeling physical laws and behaviors in unconstrained, partially observable environments remains a significant challenge, even for short time scales. Advanced pixel-based generative models consume enormous computational resources, with training times reaching up to 22 million GPU hours for models like COSMOS [6]. Model Development - DINO-world utilizes a frozen visual encoder (DINOv2) to pre-train the video world model in a latent space, followed by fine-tuning with action data for planning and control [9]. - The architecture of DINO-world significantly reduces resource consumption during both training and inference phases compared to current state-of-the-art models [10]. Training and Evaluation - DINO-world was trained on a large dataset of approximately 60 million uncleaned network videos, enabling it to learn transferable features across different domains [11]. - In the VSPW segmentation prediction task, DINO-world achieved a mean Intersection over Union (mIoU) improvement of 6.3% when predicting future frames, outperforming the second-best model [13]. Methodology - The model employs a frame encoder that does not directly model pixels but instead uses latent representations based on video patches, which significantly lowers the computational cost of training the predictor [19]. - The training objective is set as "next frame prediction," allowing for efficient parallelization and focusing on the most relevant tokens for loss calculation [27]. Action-Conditioned Fine-Tuning - DINO-world can be adapted for action-conditioned tasks by incorporating an action module that updates the query vector based on the corresponding actions, which can be trained on a small dataset of action-conditioned trajectories [30][33]. Experimental Results - DINO-world demonstrated superior performance in dense prediction tasks across various datasets, including Cityscapes, VSPW, and KITTI, validating the effectiveness of the proposed paradigm [37][38]. - The model's performance in intuitive physics tests showed a strong understanding of physical behaviors, comparable to larger models like V-JEPA [40][41]. Planning Evaluation - The action-conditioned model was trained on offline trajectories, showing significant performance improvements compared to models trained from scratch, particularly in more complex environments [44].
具身领域LLM结合强化学习与世界模型工作汇总
具身智能之心· 2025-07-29 06:15
Core Viewpoint - The article discusses recent advancements in the field of embodied intelligence, particularly focusing on the integration of large language models (LLMs) with reinforcement learning and world models, highlighting several notable research papers from 2024 [2][3]. Group 1: UniSim - UniSim aims to learn general real-world interactive simulators through generative modeling, revealing that natural datasets can provide diverse advantages for learning simulators [3]. - The research demonstrates that integrating various datasets allows for the simulation of high-level commands and low-level controls, enabling zero-shot application in real-world scenarios [3]. Group 2: Robust Agents - The study from Google DeepMind asserts that causal reasoning is essential for robust and general AI, concluding that agents capable of satisfying regret bounds must learn approximate causal models [5]. - This finding has significant implications for transfer learning and causal inference [5]. Group 3: MAMBA - MAMBA introduces an efficient world model approach for meta-reinforcement learning, addressing sample efficiency issues prevalent in current methods [8]. - The framework shows a remarkable improvement in sample efficiency, achieving up to 15 times better performance in high-dimensional tasks [8]. Group 4: EMMA - EMMA leverages LLMs trained in text-based worlds to guide the training of visual world agents, enhancing their ability to interact with dynamic environments [10]. - The approach results in a significant success rate improvement of 20%-70% in diverse tasks compared to existing VLM agents [10]. Group 5: Text2Reward - The Text2Reward framework automates the generation of dense reward functions using LLMs, addressing the challenges of reward function design in reinforcement learning [13][14]. - The method demonstrates superior performance in 13 out of 17 tasks, achieving over 94% success in new motion behaviors [14]. Group 6: Online Continual Learning - The research proposes two frameworks for continuous learning in interactive instruction-following agents, emphasizing the need for agents to learn incrementally as they explore their environments [17][18]. - A confidence-aware moving average mechanism is introduced to update parameters without relying on task boundary information [18]. Group 7: AMAGO - AMAGO is a scalable contextual reinforcement learning framework that addresses challenges in generalization, long-term memory, and meta-learning [21]. - The framework allows for parallel training of long-sequence transformers, enhancing scalability and performance in complex tasks [21]. Group 8: PDDL-based Planning - The study presents a novel paradigm for task planning using pre-trained LLMs, focusing on building explicit world models through PDDL [22][23]. - The framework significantly reduces the need for human intervention by allowing LLMs to convert between PDDL and natural language, facilitating efficient model correction [23].
WAIC 2025观察:算力竞赛升维,模型寻路落地
经济观察报· 2025-07-28 13:36
Core Insights - The 2025 World Artificial Intelligence Conference (WAIC) showcased a shift in focus from pure technical parameters to practical applications and commercial value in AI technology [2][14] - The competition in computing power is evolving into a comprehensive system engineering challenge, addressing performance, compatibility, storage, and energy efficiency [4][10] - AI companies are increasingly integrating their models with real-world applications to unlock new data sources and enhance AI capabilities [15][16] Computing Power Infrastructure - Companies like Huawei and China Digital are pushing the limits of computing power, with Huawei's Atlas 900 A3 SuperPoD achieving a performance of 300 PFLOPS [2][4] - The financial sector is supporting AI infrastructure, with companies like Chip Xin Leasing investing 8 billion yuan in AI-related projects [4] - The demand for private deployment of large models is increasing due to data security concerns, indicating a shift in market needs [5][6] Model and Application Development - AI model developers are focusing on deep integration with industry scenarios to create real business value, moving away from mere technical showcases [14][17] - Companies like Step Leap Star are launching new models aimed at cost reduction and efficiency improvement, collaborating with multiple chip manufacturers to enhance compatibility [17][18] - The importance of data storage and management is highlighted, with companies like Dawning Storage addressing challenges in data accessibility and efficiency [8][9] AI in Creative Industries - AI-generated content (AIGC) is transforming creative processes, with companies like Digital Kingdom introducing platforms that streamline content creation [20][21] - AI is positioned as a "super assistant" for creators, enhancing productivity while allowing them to focus on core creative tasks [21] Consumer-Focused AI Products - New AI products, such as the TicNote AI recording pen, are being developed to serve individual users, encapsulating complex AI capabilities in user-friendly formats [23] - The overarching goal of AI advancements is to contribute to real GDP growth across society, industries, and nations [24]
最近被公司通知不续签了。。。
自动驾驶之心· 2025-07-28 13:21
Core Viewpoint - The autonomous driving industry is facing significant profitability challenges, with even leading companies struggling to achieve stable profits due to high operational costs and regulatory constraints [3][4]. Group 1: Industry Challenges - The complexity of technology and high implementation costs mean that traditional solutions (like human labor) remain more cost-effective in certain scenarios [2][4]. - The overall job market for autonomous driving has cooled compared to previous years, with a noticeable reduction in job openings, especially for Level 4 positions, leading to increased competition [5][6]. - The profitability model of the industry is still unclear, and companies are under significant survival pressure [2][3]. Group 2: Job Market Insights - The demand for talent in the autonomous driving sector has shifted, with current hiring requiring not only solid engineering skills but also experience in mass production and practical application [6][8]. - Job openings in the sector are fewer than in previous years, and the requirements for candidates have become more stringent and practical [5][6]. Group 3: Specific Applications and Opportunities - Certain specific applications, such as logistics in ports, mines, and campuses, are more mature but face cost-effectiveness challenges and limited market size [4]. - Companies are encouraged to explore opportunities in related fields, such as robotics and industrial automation, as the autonomous driving sector continues to evolve [8].