具身领域LLM结合强化学习与世界模型工作汇总

Core Viewpoint - The article discusses recent advancements in the field of embodied intelligence, particularly focusing on the integration of large language models (LLMs) with reinforcement learning and world models, highlighting several notable research papers from 2024 [2][3]. Group 1: UniSim - UniSim aims to learn general real-world interactive simulators through generative modeling, revealing that natural datasets can provide diverse advantages for learning simulators [3]. - The research demonstrates that integrating various datasets allows for the simulation of high-level commands and low-level controls, enabling zero-shot application in real-world scenarios [3]. Group 2: Robust Agents - The study from Google DeepMind asserts that causal reasoning is essential for robust and general AI, concluding that agents capable of satisfying regret bounds must learn approximate causal models [5]. - This finding has significant implications for transfer learning and causal inference [5]. Group 3: MAMBA - MAMBA introduces an efficient world model approach for meta-reinforcement learning, addressing sample efficiency issues prevalent in current methods [8]. - The framework shows a remarkable improvement in sample efficiency, achieving up to 15 times better performance in high-dimensional tasks [8]. Group 4: EMMA - EMMA leverages LLMs trained in text-based worlds to guide the training of visual world agents, enhancing their ability to interact with dynamic environments [10]. - The approach results in a significant success rate improvement of 20%-70% in diverse tasks compared to existing VLM agents [10]. Group 5: Text2Reward - The Text2Reward framework automates the generation of dense reward functions using LLMs, addressing the challenges of reward function design in reinforcement learning [13][14]. - The method demonstrates superior performance in 13 out of 17 tasks, achieving over 94% success in new motion behaviors [14]. Group 6: Online Continual Learning - The research proposes two frameworks for continuous learning in interactive instruction-following agents, emphasizing the need for agents to learn incrementally as they explore their environments [17][18]. - A confidence-aware moving average mechanism is introduced to update parameters without relying on task boundary information [18]. Group 7: AMAGO - AMAGO is a scalable contextual reinforcement learning framework that addresses challenges in generalization, long-term memory, and meta-learning [21]. - The framework allows for parallel training of long-sequence transformers, enhancing scalability and performance in complex tasks [21]. Group 8: PDDL-based Planning - The study presents a novel paradigm for task planning using pre-trained LLMs, focusing on building explicit world models through PDDL [22][23]. - The framework significantly reduces the need for human intervention by allowing LLMs to convert between PDDL and natural language, facilitating efficient model correction [23].