强化学习
Search documents
具身领域LLM结合强化学习与世界模型工作汇总
具身智能之心· 2025-07-29 06:15
Core Viewpoint - The article discusses recent advancements in the field of embodied intelligence, particularly focusing on the integration of large language models (LLMs) with reinforcement learning and world models, highlighting several notable research papers from 2024 [2][3]. Group 1: UniSim - UniSim aims to learn general real-world interactive simulators through generative modeling, revealing that natural datasets can provide diverse advantages for learning simulators [3]. - The research demonstrates that integrating various datasets allows for the simulation of high-level commands and low-level controls, enabling zero-shot application in real-world scenarios [3]. Group 2: Robust Agents - The study from Google DeepMind asserts that causal reasoning is essential for robust and general AI, concluding that agents capable of satisfying regret bounds must learn approximate causal models [5]. - This finding has significant implications for transfer learning and causal inference [5]. Group 3: MAMBA - MAMBA introduces an efficient world model approach for meta-reinforcement learning, addressing sample efficiency issues prevalent in current methods [8]. - The framework shows a remarkable improvement in sample efficiency, achieving up to 15 times better performance in high-dimensional tasks [8]. Group 4: EMMA - EMMA leverages LLMs trained in text-based worlds to guide the training of visual world agents, enhancing their ability to interact with dynamic environments [10]. - The approach results in a significant success rate improvement of 20%-70% in diverse tasks compared to existing VLM agents [10]. Group 5: Text2Reward - The Text2Reward framework automates the generation of dense reward functions using LLMs, addressing the challenges of reward function design in reinforcement learning [13][14]. - The method demonstrates superior performance in 13 out of 17 tasks, achieving over 94% success in new motion behaviors [14]. Group 6: Online Continual Learning - The research proposes two frameworks for continuous learning in interactive instruction-following agents, emphasizing the need for agents to learn incrementally as they explore their environments [17][18]. - A confidence-aware moving average mechanism is introduced to update parameters without relying on task boundary information [18]. Group 7: AMAGO - AMAGO is a scalable contextual reinforcement learning framework that addresses challenges in generalization, long-term memory, and meta-learning [21]. - The framework allows for parallel training of long-sequence transformers, enhancing scalability and performance in complex tasks [21]. Group 8: PDDL-based Planning - The study presents a novel paradigm for task planning using pre-trained LLMs, focusing on building explicit world models through PDDL [22][23]. - The framework significantly reduces the need for human intervention by allowing LLMs to convert between PDDL and natural language, facilitating efficient model correction [23].
硬核「吵」了30分钟:这场大模型圆桌,把AI行业的分歧说透了
机器之心· 2025-07-28 04:24
Core Viewpoint - The article discusses a heated debate among industry leaders at the WAIC 2025 forum regarding the evolution of large model technologies, focusing on training paradigms, model architectures, and data sources, highlighting a significant shift from pre-training to reinforcement learning as a dominant approach in AI development [2][10][68]. Group 1: Training Paradigms - The forum highlighted a paradigm shift in AI from a pre-training dominant model to one that emphasizes reinforcement learning, marking a significant evolution in AI technology [10][19]. - OpenAI's transition from pre-training to reinforcement learning is seen as a critical development, with experts suggesting that the pre-training era is nearing its end [19][20]. - The balance between pre-training and reinforcement learning is a key topic, with experts discussing the importance of pre-training in establishing a strong foundation for reinforcement learning [25][26]. Group 2: Model Architectures - The dominance of the Transformer architecture in AI has been evident since 2017, but its limitations are becoming apparent as model parameters increase and context windows expand [31][32]. - There are two main exploration paths in model architecture: optimizing existing Transformer architectures and developing entirely new paradigms, such as Mamba and RetNet, which aim to improve efficiency and performance [33][34]. - The future of model architecture may involve a return to RNN structures as the industry shifts towards agent-based applications that require models to interact autonomously with their environments [38]. Group 3: Data Sources - The article discusses the looming challenge of high-quality data scarcity, predicting that by 2028, existing data reserves may be fully utilized, potentially stalling the development of large models [41][42]. - Synthetic data is being explored as a solution to data scarcity, with companies like Anthropic and OpenAI utilizing model-generated data to supplement training [43][44]. - Concerns about the reliability of synthetic data are raised, emphasizing the need for validation mechanisms to ensure the quality of training data [45][50]. Group 4: Open Source vs. Closed Source - The ongoing debate between open-source and closed-source models is highlighted, with open-source models like DeepSeek gaining traction and challenging the dominance of closed-source models [60][61]. - Open-source initiatives are seen as a way to promote resource allocation efficiency and drive industry evolution, even if they do not always produce the highest-performing models [63][64]. - The future may see a hybrid model combining open-source and closed-source approaches, addressing challenges such as model fragmentation and misuse [66][67].
大模型发展情况综述
2025-07-28 01:42
Summary of Key Points from Conference Call Records Industry Overview - The conference call discusses the development of large model technology, indicating a shift from research to application, with 2025 being a critical turning point for the industry [1][2] - The global landscape shows the U.S. leading in computing power while China excels in efficiency [1][5] Core Insights and Arguments - The capital market's attitude towards AI investments has shifted from research funding to a focus on certainty and stability, with a noted pessimism regarding domestic large models that may be corrected, leading to potential gains [1][6] - The accuracy of large models has improved due to real-time data integration and enhanced retrieval-augmented generation techniques, with synthetic data expected to surpass human-accumulated data by 2028-2030 [3][16][17] - The context window length has significantly increased, allowing models to process longer text, thus improving overall performance and accuracy [9] - The development of agent and collective intelligence is advancing rapidly, with agents capable of completing complex tasks more efficiently than typical interns, indicating strong commercial potential [12][14] Important but Overlooked Content - The scaling law's effectiveness was validated by GPT-4.5, emphasizing the importance of deep reasoning and the significant impact of reasoning time on model performance [1][5][8] - The introduction of low-precision training techniques has reduced computing costs while facing challenges like gradient loss, with advancements in models like Deepseek R1 achieving large-scale training at FP8 precision [19] - The AI application revenue growth is notable, with sectors like AI search and programming showing rapid expansion, and a strong willingness to pay for AI applications compared to traditional ones [25][26] - Collective intelligence in finance has shown advantages through collaboration among agents, leading to higher return rates in stock trading compared to single models [15] Conclusion - The large model technology is at a pivotal moment, with significant advancements in efficiency, accuracy, and commercial viability, particularly in the AI sector, which is poised for explosive growth and investment opportunities [1][27]
商汤科技20250727
2025-07-28 01:42
Summary of Key Points from the Conference Call Company and Industry Involved - **Company**: SenseTime Technology (商汤科技) - **Industry**: Artificial Intelligence (AI) and its applications across various sectors Core Insights and Arguments 1. **Advancements in AI Technology**: Chinese large model technology has shown outstanding performance in reasoning capabilities, open-source ecosystems, cost efficiency, and vertical applications, necessitating continuous technological breakthroughs and algorithm originality [2][3][41] 2. **Shanghai's AI Ecosystem**: Shanghai has established a parallel system of open-source and commercial large models, with 82 models registered nationally, positioning AI as a new growth engine for the city's economy [2][5] 3. **Sustainability Challenges**: The AI industry faces sustainability challenges, particularly regarding the energy consumption of data centers, which is projected to account for 8% of global electricity usage by 2030 [2][8] 4. **Economic Impact of AI Investment**: Investment in computing power and AI yields significant economic benefits, with a 1% increase in computing power index correlating to a 1.8‰ GDP growth [3][13] 5. **Policy Support for AI Development**: There is a call for enhanced policy support to create a favorable environment for AI development, including the use of intellectual property and fiscal policies [3][4] Other Important but Possibly Overlooked Content 1. **AI's Role in Reducing Carbon Emissions**: AI can significantly reduce carbon emissions in heavy industries and enhance energy efficiency in factories, with successful implementations already seen in Singapore and ASEAN [3][11] 2. **Challenges in AI Training**: The training of large models is energy-intensive, with the energy consumption during the reasoning phase increasing significantly with usage, potentially becoming a major source of energy consumption [8][9] 3. **Future Directions for AI Models**: The future of large model technology may involve expanding current paradigms to accept natural language feedback and developing autonomous online agents capable of self-learning [25][26] 4. **Open Source vs. Closed Source Dynamics**: The ongoing competition between open-source and closed-source models will shape the AI ecosystem, with open-source models driving efficiency and collaboration [37][39] 5. **SenseTime's Innovations**: SenseTime has made significant strides in AI, particularly with its SenseNova large model, which aims to unlock general AI task capabilities at low costs, facilitating widespread AI adoption across industries [41][59] This summary encapsulates the key points discussed during the conference call, highlighting the advancements, challenges, and future directions of AI technology, particularly in the context of SenseTime and the broader industry landscape.
阿里Qwen提出强化学习新算法GSPO
news flash· 2025-07-27 15:20
Core Insights - The article discusses the introduction of the Group Sequence Policy Optimization (GSPO) algorithm by Tongyi Qwen to enhance Reinforcement Learning (RL) capabilities [1] Group 1 - GSPO defines importance ratios at the sequence level, differentiating it from previous RL algorithms [1] - The algorithm executes clipping, rewards, and optimization at the sequence level [1]
中国互联网大会上,参展的众多AI应用企业不约而同选择这一发展模式,为什么?
Mei Ri Jing Ji Xin Wen· 2025-07-26 16:19
Core Viewpoint - The China Internet Conference held from July 23 to 25 showcased numerous AI-related technologies, with many companies opting for an open-source development model, indicating a significant shift in commercial strategies and business models in the tech industry [1][2]. Group 1: Open Source Development - Open-source represents a different technical route compared to closed-source, reflecting substantial differences in business strategies and profit distribution [2]. - Companies are choosing open-source to enhance collaboration and innovation, allowing for faster development and a more vibrant ecosystem [6]. Group 2: Robotics and AI Applications - A robotics company showcased bipedal robots that perform dynamic movements, primarily focusing on providing open interfaces for schools and developers to customize functionalities [3][5]. - The robots are designed with 3D-printed shells and electric motors, emphasizing the importance of motor torque in robotic performance [5]. - The company aims to support humanoid robot manufacturers, including notable firms like Boston Dynamics, by providing foundational technology [5]. Group 3: Xiaomi's Open Source Initiatives - Xiaomi presented its Vela operating system, which is now fully open-source, aimed at enhancing the development efficiency of other manufacturers and promoting interoperability among devices [6]. - The Xiaomi AIoT training box, also open-source, is designed for educational purposes, allowing partner institutions to implement the system as part of collaborative projects [9]. - A holographic digital human showcased at the event utilizes DeepSeek's open-source code, allowing for cost-effective deployment without licensing fees, with expenses primarily related to computational power for training [9].
二段式端到端新SOTA!港科大FiM:从Planning的角度重新思考轨迹预测(ICCV'25)
自动驾驶之心· 2025-07-26 13:30
Core Viewpoint - The article presents a novel approach to trajectory prediction in autonomous driving, emphasizing a "First Reasoning, Then Forecasting" strategy that integrates intention reasoning to enhance prediction accuracy and reliability [2][4][47]. Group 1: Methodology - The proposed method introduces an intention reasoner based on a query-centric Inverse Reinforcement Learning (IRL) framework, which explicitly incorporates behavioral intentions as spatial guidance for trajectory prediction [2][5][47]. - A bidirectional selective state space model (Bi-Mamba) is developed to improve the accuracy and confidence of trajectory predictions by capturing sequential dependencies in trajectory states [9][47]. - The approach utilizes a grid-level graph representation to model participant behavior, formalizing the task as a Markov Decision Process (MDP) to define future intentions [5][6][21]. Group 2: Experimental Results - Extensive experiments on large-scale datasets such as Argoverse and nuScenes demonstrate that the proposed method significantly enhances trajectory prediction confidence, achieving competitive performance compared to state-of-the-art models [2][33][36]. - The method outperforms existing models in various metrics, including Brier score and minFDE6, indicating its robustness in complex driving scenarios [33][35][36]. - The integration of a spatial-temporal occupancy grid map (S-T OGM) enhances the model's ability to predict future interactions among participants, further improving prediction quality [9][39]. Group 3: Contributions - The article highlights the critical role of intention reasoning in motion prediction, establishing a promising baseline model for future research in trajectory prediction [47]. - The introduction of a reward-driven intention reasoning mechanism provides valuable prior information for trajectory generation, addressing the inherent uncertainties in driving behavior [8][47]. - The work emphasizes the potential of reinforcement learning paradigms in modeling driving behavior, paving the way for advancements in autonomous driving technology [5][47].
开发者福利!一台机器搞定人形运控、强化学习、VLN/VLA
具身智能之心· 2025-07-25 07:11
Core Viewpoint - TRON1 is a cutting-edge research platform designed for educational and scientific purposes, featuring a modular design that supports multiple robotic forms and algorithms, catering to diverse research needs [1]. Group 1: Product Features - TRON1 supports humanoid gait development and is ideal for reinforcement learning research, with the EDU version allowing for external camera integration for navigation and perception tasks [6][24]. - The platform supports C++ and Python for development, making it accessible for users without C++ knowledge [6]. - It features a "three-in-one" modular design that allows for quick switching between bipedal, point-foot, and wheeled locomotion [1]. Group 2: Technical Specifications - The platform is compatible with major simulation platforms like NVIDIA Isaac, Mujoco, and Gazebo, enhancing validation efficiency and lowering research barriers [9]. - TRON1 can be equipped with a robotic arm for various mobile operation tasks, supporting both single-arm and dual-foot configurations [11]. - It integrates LiDAR and depth cameras for 3D mapping, localization, navigation, and dynamic obstacle avoidance [13]. Group 3: Hardware and Performance - The TRON1 standard version and EDU version share similar mechanical parameters, with a weight limit of approximately 10 kg and a maximum speed of 5 m/s for wheeled locomotion [26]. - The platform is powered by an 8-core Arm Cortex-A78AE CPU and features NVIDIA Ampere architecture GPU with AI computing power of 157 TOPS (sparse) and 78 TOPS (dense) [16][19]. - The battery supports a maximum power of 1000W, with a runtime of over 2 hours under rated conditions [26]. Group 4: User Support and Development - Comprehensive user manuals and development guides are provided, ensuring ease of use and support for new users [29][33]. - The platform offers a one-year after-sales service post-acceptance, with paid maintenance and parts support available thereafter [40].
NVIDIA最新!ThinkAct:复杂的具身任务中实现少样本适应、长时程规划
具身智能之心· 2025-07-24 09:53
Core Insights - The article introduces ThinkAct, a dual-system framework designed to enhance the reasoning capabilities of multi-modal large language models (MLLMs) in physical environments by connecting high-level reasoning with low-level action execution [4][9][12] - ThinkAct aims to address the limitations of existing VLA models that struggle with long-term planning and adapting to complex tasks by utilizing reinforced visual latent planning [4][6][9] Group 1: Framework and Methodology - ThinkAct employs a structured approach to VLA reasoning tasks, where the model receives visual observations and textual instructions to predict actions, effectively linking abstract planning with low-level control [12][21] - The framework utilizes reinforcement learning to enhance the reasoning capabilities of MLLMs, encouraging them to generate low-level actions after reasoning through the task [13][19] - A novel action-aligned visual feedback mechanism is introduced to capture long-term goals and encourage visual associations during the planning process [14][18] Group 2: Performance Evaluation - ThinkAct demonstrates superior performance in various robotic operation tasks, achieving a top success rate of 84.4% on the LIBERO benchmark, outperforming other models like DiT-Policy and CoT-VLA [25][26] - In the SimplerEnv evaluation, ThinkAct outperformed baseline action models by significant margins, achieving overall scores of 71.5%, 65.1%, and 43.8% across different settings [25] - The framework also excels in embodied reasoning tasks, showing advantages in long-term and multi-step planning capabilities, as evidenced by its performance on EgoPlan-Bench2 and RoboVQA benchmarks [26][27] Group 3: Qualitative Insights - The article provides qualitative examples illustrating ThinkAct's reasoning process and execution in tasks, showcasing its ability to decompose instructions into meaningful sub-goals and visualize planning trajectories [30][31] - The framework's reinforcement learning adjustments significantly enhance its reasoning capabilities, allowing it to better understand tasks and environments compared to cold-start models [31][32] Group 4: Adaptability and Error Correction - ThinkAct demonstrates effective few-shot adaptation capabilities, successfully generalizing to unseen environments and new skills with minimal demonstration samples [35][37] - The framework's ability to detect execution errors and perform ego correction is highlighted, showcasing its structured reasoning to reconsider tasks and generate corrective plans when faced with failures [37][38]
AI的未来,或许就藏在我们大脑的进化密码之中 | 红杉Library
红杉汇· 2025-07-24 06:29
Core Viewpoint - The article discusses the evolution of the human brain and its implications for artificial intelligence (AI), emphasizing that understanding the brain's evolutionary breakthroughs may unlock new advancements in AI capabilities [2][7]. Summary by Sections Evolutionary Breakthroughs - The evolution of the brain is categorized into five significant breakthroughs that can be linked to AI development [8]. 1. **First Breakthrough - Reflex Action**: This initial function allowed primitive brains to distinguish between good and bad stimuli using a few hundred neurons [8]. 2. **Second Breakthrough - Reinforcement Learning**: This advanced the brain's ability to quantify the likelihood of achieving goals, enhancing AI's learning processes through rewards [8]. 3. **Third Breakthrough - Neocortex Development**: The emergence of the neocortex enabled mammals to plan and simulate actions mentally, akin to slow thinking in AI models [9]. 4. **Fourth Breakthrough - Theory of Mind**: This allowed primates to understand others' intentions and emotions, which is still a developing area for AI [10]. 5. **Fifth Breakthrough - Language**: Language as a learned social system has allowed humans to share complex knowledge, a capability that AI is beginning to grasp [11]. AI Development - Current AI systems have made strides in areas like language understanding but still lag in aspects such as emotional intelligence and self-planning [10][11]. - The article illustrates the potential future of AI through a hypothetical robot's evolution, showcasing how it could develop from simple reflex actions to complex emotional understanding and communication [13][14]. Historical Context - The narrative emphasizes that significant evolutionary changes often arise from unexpected events, suggesting that future breakthroughs in AI may similarly emerge from unforeseen circumstances [15][16].