Reinforcement Learning (RL)
Search documents
Building the GitHub for RL Environments: Prime Intellect's Will Brown & Johannes Hagemann
Sequoia Capital· 2026-02-10 13:00
If data is the bottleneck, if having the real expertise is the bottleneck, like would you rather have the smartest person in history work at your company or someone who's been there for 30 years. Sometimes you really want the person who's been there for 30 years. There's a lot of expertise that comes from really understanding a problem deeply and interact with it over a long time.And this is really what happens in training that is almost impossible to replicate in a a short prompt. You really want the abili ...
2025 AI 年度复盘:读完200篇论文,看DeepMind、Meta、DeepSeek ,中美巨头都在描述哪种AGI叙事
3 6 Ke· 2026-01-12 08:44
Core Insights - The article discusses the evolution of artificial intelligence (AI) in 2025, highlighting a shift from merely increasing model parameters to enhancing model intelligence through foundational research in areas like fluid reasoning, long-term memory, spatial intelligence, and meta-learning [2][4]. Group 1: Technological Advancements - In 2025, significant technological progress was observed in fluid reasoning, long-term memory, spatial intelligence, and meta-learning, driven by the diminishing returns of scaling laws in AI models [2][3]. - The bottleneck in current AI technology lies in the need for models to not only possess knowledge but also to think and remember effectively, revealing a significant imbalance in AI capabilities [2][4]. - The introduction of Test-Time Compute revolutionized reasoning capabilities, allowing AI to engage in deeper, more thoughtful processing during inference [6][10]. Group 2: Memory and Learning Enhancements - The Titans architecture and Nested Learning emerged as breakthroughs in memory capabilities, enabling models to update their parameters in real-time during inference, thus overcoming the limitations of traditional transformer models [19][21]. - Memory can be categorized into three types: context as memory, RAG-processed context as memory, and internalized memory through parameter integration, with significant advancements in RAG and parameter adjustment methods [19][27]. - The introduction of sparse memory fine-tuning and on-policy distillation methods has mitigated the issue of catastrophic forgetting, allowing models to retain old knowledge while integrating new information [31][33]. Group 3: Spatial Intelligence and World Models - The development of spatial intelligence and world models was marked by advancements in video generation models, such as Genie 3, which demonstrated improved physical understanding and consistency in generated environments [35][36]. - The emergence of the World Labs initiative, led by Stanford professor Fei-Fei Li, focused on generating 3D environments based on multimodal inputs, showcasing a more structured approach to AI-generated content [44][46]. - The V-JEPA 2 model introduced by Meta emphasized predictive learning, allowing models to grasp physical rules through prediction rather than mere observation, enhancing their understanding of causal relationships [50][51]. Group 4: Reinforcement Learning Innovations - Reinforcement learning (RL) saw significant advancements with the rise of verifiable rewards and sparse reward metrics, leading to improved performance in areas like mathematics and coding [11][12]. - The GPRO algorithm gained popularity, simplifying the RL process by eliminating the need for a critic model, thus reducing computational costs while maintaining effectiveness [15][16]. - The exploration of RL's limitations revealed a ceiling effect, indicating that while RL can enhance existing model capabilities, further breakthroughs will require innovations in foundational models or algorithm architectures [17][18].
首个文本到3D生成RL范式诞生,攻克几何与物理合理性
具身智能之心· 2025-12-20 16:03
Core Viewpoint - The article discusses the application of Reinforcement Learning (RL) in enhancing Text-to-3D generation, exploring its effectiveness and challenges in this complex domain [4][5]. Group 1: Research Background - A collaborative research effort involving multiple universities aims to investigate the potential of RL in improving 3D generation processes [4]. - The study focuses on whether RL can enhance the reasoning and generation capabilities of 3D autoregressive models, building on its success in large language models (LLMs) and 2D image generation [5]. Group 2: Challenges in 3D Generation - Key challenges identified include designing rewards that capture semantic alignment, geometric consistency, and visual quality [6]. - Existing RL algorithms may not be suitable for autoregressive 3D generation, and there is a lack of benchmarks specifically assessing "3D reasoning capabilities" [6]. Group 3: Reward Design Layer - The research found that aligning with human preference signals is crucial for improving overall 3D quality, while specialized reward models often outperform large multimodal models [10]. - The study indicates that token-level strategies in RL are more effective than sequence-level operations in 3D autoregressive generation [11]. Group 4: Benchmark Layer - The MME-3DR benchmark was developed to evaluate 3D reasoning, focusing on maintaining consistency and interpretability under challenging constraints [15]. - RL training significantly improved performance across various tasks, particularly in mechanical structures and non-rigid biological entities [16]. Group 5: RL Paradigm Layer - The research proposes a hierarchical RL paradigm (Hi-GRPO) that treats 3D generation as a coarse-to-fine process, enhancing the model's implicit 3D reasoning capabilities [18][19]. - The findings highlight the importance of respecting structural priors in the design of reward models for effective training [20]. Group 6: Performance Insights - The study reveals that while RL can enhance model performance, challenges remain in handling complex geometries and rare concepts, indicating limitations in current 3D RL capabilities [22].
准确率腰斩,大模型视觉能力一出日常生活就「失灵」
3 6 Ke· 2025-12-09 06:59
Core Insights - The EgoCross project focuses on evaluating cross-domain first-person video question answering, revealing the limitations of existing MLLMs in specialized fields such as surgery, industry, extreme sports, and animal perspectives [1][3][4] Group 1: Project Overview - EgoCross is the first cross-domain EgocentricQA benchmark, covering four high-value professional fields and containing nearly 1,000 high-quality QA pairs [3][9] - The project provides both closed (CloseQA) and open (OpenQA) evaluation formats, addressing a significant gap in the assessment of models in these specialized areas [3][9] Group 2: Model Evaluation - Eight mainstream MLLMs were tested, revealing that even the best-performing models had a CloseQA accuracy below 55% and OpenQA accuracy below 35% in cross-domain scenarios [4][9] - The study found that reinforcement learning (RL) methods could significantly improve performance, with an average increase of 22% in accuracy [10][16] Group 3: Task and Domain Challenges - The research highlights the significant domain shift between everyday activities and specialized fields, with models performing well in daily tasks but struggling in professional contexts [8][9] - The study identified that prediction tasks showed a more severe decline in performance compared to basic identification tasks [13][16] Group 4: Improvement Strategies - Three improvement methods were explored: prompt learning, supervised fine-tuning (SFT), and reinforcement learning (RL), with RL showing the most substantial performance gains [15][16] - The findings suggest that current models have limitations in generalization, indicating a need for further development to create more capable multimodal systems [16]
地平线RAD:基于3DGS 大规模强化学习的端到端驾驶策略
自动驾驶之心· 2025-11-29 02:06
Core Insights - The article discusses a novel approach to reinforcement learning (RL) for end-to-end (e2e) policy development in autonomous driving, utilizing 3D Graphics Simulation (3DGS) to enhance training environments [1][2] - The proposed method significantly reduces collision rates, achieving a threefold decrease compared to pure imitation learning (IL) [1] - Limitations of the 3DGS environment include a lack of interaction, reliance on log replay, and inadequate rendering of non-rigid pedestrians and low-light scenarios [1] Summary by Sections Methodology - The approach consists of three main phases: training a basic Bird's Eye View (BEV) and perception model, freezing perception to train a planning head using IL, and generating a sensor-level environment with 3DGS for mixed training of RL and IL [3][5][6] - The training process involves pre-training perception models, followed by IL training on human expert data, and finally fine-tuning with RL to enhance sensitivity to critical risk scenarios [10][12] State and Action Space - The state space includes various encoders for BEV features, static map elements, traffic participant information, and planning-related features [7] - The action space is defined with discrete movements for lateral and longitudinal actions, allowing for a total of 61 actions in both dimensions [8] Reward Function - The reward function is designed to penalize collisions and deviations from expert trajectories, with specific thresholds for dynamic and static collisions, as well as positional and heading deviations [17][19] - Auxiliary tasks are introduced to stabilize training and accelerate convergence, focusing on behaviors like deceleration and acceleration [20][23] Experimental Results - The results indicate that the proposed method outperforms other IL-based algorithms, demonstrating the advantages of closed-loop training in dynamic environments [28][29] - The optimal ratio of RL to IL data is found to be 4:1, contributing to improved performance metrics [28] Conclusion - The article emphasizes the practical engineering improvements achieved through the integration of 3DGS in training environments, leading to better performance in autonomous driving applications [1][2]
Ilya Sutskever 重磅3万字访谈:AI告别规模化时代,回归“研究时代”的本质
创业邦· 2025-11-27 03:51
Core Insights - The AI industry is transitioning from a "Scaling Era" back to a "Research Era," emphasizing fundamental innovation over mere model size expansion [4][7][40]. - Current AI models exhibit high performance in evaluations but lack true generalization capabilities, akin to students who excel in tests without deep understanding [10][25]. - SSI's strategy focuses on developing safe superintelligence without commercial pressures, aiming for a more profound understanding of AI's alignment with human values [15][16]. Group 1: Transition from Scaling to Research - The period from 2012 to 2020 was characterized as a "Research Era," while 2020 to 2025 is seen as a "Scaling Era," with a return to research now that computational power has significantly increased [4][7][40]. - Ilya Sutskever argues that simply scaling models will not yield further breakthroughs, as the data and resources are finite, necessitating new learning paradigms [7][39]. Group 2: Limitations of Current Models - Current models are compared to students who have practiced extensively but lack the intuitive understanding of true experts, leading to poor performance in novel situations [10][25]. - The reliance on pre-training and reinforcement learning has resulted in models that excel in benchmarks but struggle with real-world complexities, often introducing new errors while attempting to fix existing ones [20][21]. Group 3: Pursuit of Superintelligence - SSI aims to avoid the "rat race" of commercial competition, focusing instead on building a safe superintelligence that can care for sentient life [15][16]. - Ilya emphasizes the importance of a value function in AI, akin to human emotions, which guides decision-making and learning efficiency [32][35]. Group 4: Future Directions and Economic Impact - The future of AI is predicted to be marked by explosive economic growth once continuous learning challenges are overcome, leading to a diverse ecosystem of specialized AI companies [16][18]. - Ilya suggests that human roles may evolve to integrate with AI, maintaining balance in a world dominated by superintelligent systems [16][18].
Ilya两万字最新访谈:人类的情感并非累赘,而是 AI 缺失的“终极算法”
3 6 Ke· 2025-11-26 04:26
Core Insights - The discussion centers on the limitations of current AI models and the new pathways toward superintelligence, emphasizing the disconnect between model performance in evaluations and real-world applications [3][4][20] - Ilya Sutskever highlights the need to transition back to a research-focused paradigm, moving away from mere scaling of models, as the diminishing returns of scaling become evident [3][34] - The concept of a "value function" is introduced as a critical element that enables human-like learning efficiency, which current AI lacks [3][5][6] Group 1: Current AI Limitations - Current AI models perform well in evaluation tests but often make basic errors in practical applications, indicating a lack of true understanding and generalization [4][18][20] - The over-optimization of reinforcement learning (RL) for evaluations has led to models that excel in competitive programming but struggle with real-world problem-solving [4][21] - Sutskever compares AI models to competitive programmers who are skilled in solving specific problems but lack the broader intuition and creativity of more versatile learners [4][22] Group 2: Human Learning Insights - Human learning is characterized by high sample efficiency, allowing individuals to learn complex skills with minimal data, attributed to innate value functions that guide decision-making [5][6][40] - The evolutionary advantages in human learning, particularly in areas like vision and motor skills, suggest that humans possess superior learning algorithms compared to current AI systems [5][38] - The discussion emphasizes the importance of emotional and intuitive feedback in human learning, which AI currently lacks [6][30][31] Group 3: Strategic Directions for SSI - Ilya Sutskever's new company, SSI, aims to explore safe superintelligence, advocating for a gradual release of AI capabilities to raise public awareness about safety [7][52] - The shift from a secretive development approach to a more transparent, gradual release strategy is seen as essential for fostering a collaborative safety environment [7][52] - SSI's focus on research over immediate market competition is intended to prioritize safety and ethical considerations in AI development [52][54] Group 4: Research Paradigm Shift - The transition from an era of scaling (2020-2025) back to a research-focused approach is necessary as the limits of scaling become apparent [34][46] - Sutskever argues that while scaling has been beneficial, it has also led to a homogenization of ideas, necessitating a return to innovative research [34][46] - The need for a more efficient use of computational resources in research is highlighted, suggesting that breakthroughs may come from novel approaches rather than sheer scale [35][46]
Z Event|NeurIPS 2025 活动专场:RL x Agent ,给 AGI 的 2026 写下最后预言
Z Potentials· 2025-11-25 03:28
Core Insights - The article emphasizes the growing importance of Reinforcement Learning (RL) and Agents in the context of large models, highlighting a shift from merely generating text to enabling models to perform actions through decision-making processes [1][2]. Group 1: Event Overview - The NeurIPS 2025 event aims to create a relaxed environment for researchers and engineers from leading organizations like OpenAI, DeepMind, and Meta FAIR to discuss RL, decision-making, and the underlying capabilities of large models [1]. - The event will not feature formal presentations but will encourage informal discussions about technology, ideas, and experiences, fostering a collaborative atmosphere [1]. Group 2: Focus on RL and Agents - There is a renewed focus on RL, moving beyond traditional fine-tuning methods to enable models to strengthen through interaction with the environment [2]. - The development of executable Agents requires a robust Action Layer, which is essential for models to perform tasks effectively [2][3]. Group 3: Industry Developments - Platforms like Composio are emerging to build the next generation of AI Agents by creating an Action Layer that integrates various tools and APIs into a unified interface, highlighting the infrastructure needed for operational Agents [3]. - Investment in AI infrastructure is being driven by funds like Hattrick Capital, which have been early supporters of AI advancements, particularly in the areas of Agents and robotics [4].
从 AI 创业角度看 GEO:如何引流、效果评估,以及创业机会在哪里?
Founder Park· 2025-08-10 01:33
Core Insights - GEO (Generative Engine Optimization) is not a completely new concept but rather an evolution of SEO in the era of AI search and LLMs [2][4] - There is ongoing debate about the potential of GEO as a significant business opportunity, with some viewing it as a new frontier while others see it as merely an extension of SEO [4][5] - The article emphasizes the importance of understanding GEO's principles, strategies for content optimization, and monitoring effectiveness [5] Group 1: Understanding GEO - GEO is fundamentally about optimizing content for AI retrieval and summarization, focusing on making content easily accessible and understandable for AI systems [10][30] - The shift from traditional SEO to GEO involves changes in how content is ranked and made visible, with LLMs generating structured responses that complicate traditional ranking methods [9][14] - Effective GEO strategies include content optimization, evaluation metrics, and conducting commercial GEO experiments [9][10] Group 2: Content Optimization Strategies - RAG (Retrieval-Augmented Generation) workflows are essential for GEO, emphasizing the need for clear structure and readability in content [19][20] - Content should be designed to be easily retrievable and quotable, with a focus on clarity and reducing ambiguity in expression [21][22] - Strategies for enhancing content visibility include using specific terminology, avoiding vague references, and employing structured data formats like Schema.org [27][28] Group 3: Agent Optimization Strategies - AEO (Agentic Engine Optimization) is a subset of GEO, focusing on optimizing content for agent-based interactions [30] - Content should be task-oriented and contextually rich to facilitate agent understanding and action [31][32] - Clear definitions and user-friendly documentation are crucial for enhancing agent interactions and ensuring effective task completion [33][34] Group 4: Practical Implementation of GEO - A closed-loop process of content creation, exposure, retention, and optimization is vital for successful GEO [36] - Establishing authority signals (E-E-A-T) is important for building trust with AI systems, which prefer credible and expert sources [37] - Continuous content updates and engagement with external authoritative sources can enhance visibility and credibility in AI-driven environments [38][39] Group 5: Measuring GEO Effectiveness - Evaluating the visibility and citation of content across AI search platforms is essential for understanding its impact [39][40] - Various methods, such as SERP detection and AI citation monitoring, can be employed to assess content performance [40][41] - Analyzing user behavior and conversion rates from AI-driven traffic can provide insights into the effectiveness of GEO strategies [44][46] Group 6: GEO Tools and Companies - Several tools and companies are emerging in the GEO space, focusing on enhancing visibility and citation in AI search environments [49][50] - Platforms like Profound and Goodie AI are designed to optimize content for AI retrieval and improve brand exposure [56][57] - The competitive landscape for GEO tools is evolving, with a focus on integrating AI capabilities into traditional SEO practices [66][68]
中国人形机器人_ 人工智能大会要点_ 轮式机器人演示比双足更常见,应用更广泛-China Humanoid Robot_ WAIC 2025 takeaways_ Broader applications with wheel-based robot demo more common than bipedal
2025-07-29 02:31
Summary of WAIC 2025 Takeaways Industry Overview - The conference showcased significant advancements in the AI and robotics industry, with a 35% increase in venue size to 70,000 sqm and a 31% increase in ticket prices to Rmb168 per day, featuring 800 exhibitors (up 60% year-over-year) and over 1,200 speakers [1][2]. Core Insights 1. **Application Scenarios**: There was a more targeted exploration of application scenarios across various sectors including manufacturing, logistics, retail, and elderly care, indicating a shift towards early commercialization [2][7]. 2. **Product Improvements**: Humanoid robots demonstrated meaningful product improvements, moving from static displays to engaging in interactive task demonstrations [2][8]. 3. **Prototype Trends**: A noticeable shift towards AGV-style wheeled bases was observed, suggesting a pragmatic approach to achieving near-term commercial viability, which may negatively impact stocks related to planetary roller screw components [2][9]. 4. **Cost Trends**: Cost curves for humanoid robots are decreasing but not significantly, with the lowest ASP reported at Rmb40,000 for Unitree's new model [2][14]. 5. **Manipulation Challenges**: Manipulation remains a core challenge, with issues around success rates, robustness, and reliability still prevalent [2][12]. Notable Exhibitors and Innovations - **Noematrix**: Showcased wheel-based prototypes performing various tasks, indicating a focus on practical applications [7][18]. - **Galbot**: Demonstrated retail automation robots capable of complex tasks, achieving efficiency levels comparable to human workers [17][18]. - **AgiBot**: Introduced multiple humanoid robots targeting various applications, including logistics and customer interaction [17]. - **Unitree**: Highlighted advancements in dynamic locomotion with their humanoid robots, showcasing improved autonomous capabilities [20]. Future Outlook - The exhibition reinforced a constructive view on humanoid robots as a long-term technology trend, with expectations for a technology inflection point approaching, although not yet realized [3][12]. - Upcoming updates from Tesla's Gen 3 Optimus are anticipated to be significant for the sector [3]. Investment Recommendations - **Sanhua Intelligent Controls**: Rated as a Buy due to growth potential in auto/EV thermal management and HVAC systems [21]. - **Zhejiang Supcon Technology Co.**: Also rated as a Buy, with strong market share in process automation and potential for vertical expansion [22]. - **Best Precision**: Neutral rating, with expectations of becoming a competitive supplier for humanoid robots [23]. - **Leader Harmonious Drive Systems**: Neutral rating, with potential growth in harmonic reduction gear applications [26]. - **Shanghai Baosight Software**: Neutral rating, with concerns over reliance on related-party transactions [27]. Conclusion The WAIC 2025 highlighted significant advancements in humanoid robotics, with a clear trend towards practical applications and commercialization. The investment landscape appears promising for select companies within the sector, although challenges remain in manipulation and cost efficiency.