强化学习
Search documents
游戏教父 John Carmack:LLM 不是游戏的未来
AI前线· 2025-06-16 07:37
Core Viewpoint - The article discusses the evolution and challenges of artificial intelligence (AI) in gaming and virtual environments, emphasizing the importance of interactive learning experiences over traditional pre-training methods. It critiques the limitations of large language models (LLMs) and highlights the need for more effective learning frameworks in AI development [16][18][19]. Group 1: Background and Development - Id Software, founded in the 1990s, played a significant role in the development of iconic games that contributed to GPU advancements and the modern AI landscape [3]. - The author has extensive experience in various tech companies, including Armadillo Aerospace and Oculus, focusing on the development of virtual reality technologies [6][8]. Group 2: Learning and AI Models - The article critiques the effectiveness of LLMs, arguing that many people do not fully understand their limitations, particularly in learning from new environments [16]. - It emphasizes the importance of interactive learning, suggesting that AI should learn through experiences similar to how humans and animals do, rather than relying solely on pre-trained models [16][18]. Group 3: Gaming and AI Interaction - The author notes that traditional gaming AI often relies on internal game structures, which can lead to cheating, while cloud gaming could mitigate this issue [18]. - The article discusses the limitations of current AI models in learning from games, highlighting that significant amounts of experience (e.g., 200 million frames) are required to reach human-level performance [20][34]. Group 4: Challenges in AI Learning - The article identifies ongoing challenges in continuous, efficient, and lifelong learning within AI, which are tasks that even simple animals can accomplish easily [20]. - It points out that many AI systems struggle with learning in complex environments, and traditional reinforcement learning frameworks may not be suitable for all scenarios [30][32]. Group 5: Future Directions - The author proposes a mixed approach to learning environments, combining passive and interactive content to enhance AI learning capabilities [22]. - The article suggests that new benchmarks should be established to evaluate AI performance across various games, focusing on long-term learning and retention of skills [95][97].
周末,大消息不断!
证券时报· 2025-06-15 11:10
Macro News - Guangzhou has announced a plan to optimize real estate policies by fully canceling purchase restrictions, sales restrictions, and price limits, while also lowering down payment ratios and interest rates to stimulate housing consumption [2] - Starting from November 2025, Chinese passport holders with valid Australian visas will be able to enter New Zealand without a visa for up to three months [3] Financial Sector - The People's Bank of China will conduct a 400 billion yuan reverse repurchase operation on June 16, 2025, with a term of six months to maintain ample liquidity in the banking system [7] - As of the end of May, the broad money supply (M2) in China stood at 325.78 trillion yuan, reflecting a year-on-year growth of 7.9% [8] - The China Securities Regulatory Commission has imposed fines totaling nearly 77 million yuan on Xu Wenbin for manipulating stock prices [9] Industry and Company - Volcano Engine has upgraded its "Doubao" service, reducing usage costs to one-third through "interval pricing," aiming to accelerate the large-scale application of intelligent agents [10] - GAC Group has committed to ensuring the completion of dealer rebate payments within two months to support the healthy development of the automotive industry [11] - Kweichow Moutai has adjusted its 2024 profit distribution plan, increasing the cash dividend per share to 27.673 yuan, totaling 34.671 billion yuan [12] Upcoming Events - This week, new stocks include Guangxin Technology with a subscription code of 920037 and a price of 10 yuan per share, with a subscription limit of 950,000 shares [13] - Xintong Electronics has a subscription code of 001388, with a subscription limit of 12,000 shares [14] - Over 450 billion yuan worth of A-shares will be unlocked this week, with 48 stocks facing unlocks totaling 2.914 billion shares [16] Institutional Strategies - Huatai Securities reports that the escalation of the Israel-Iran conflict has led to high volatility in oil prices, with WTI and Brent crude oil futures prices rising by 16.7% and 14.9% respectively since the beginning of June [18] - CITIC Securities notes that liquidity in the Hong Kong stock market continues to improve, presenting good opportunities for increasing positions amid potential overseas fluctuations [19]
“AI教父”辛顿最新专访:没有什么人类的能力是AI不能复制的
创业邦· 2025-06-15 03:08
Core Viewpoint - AI is evolving at an unprecedented speed, becoming smarter and making fewer mistakes, with the potential to possess emotions and consciousness. The probability of AI going out of control is estimated to be between 10% and 20%, raising concerns about humanity being dominated by AI [1]. Group 1: AI's Advancements - AI's reasoning capabilities have significantly increased, with a marked decrease in error rates, gradually surpassing human abilities [2]. - AI now possesses information far beyond any individual, demonstrating superior intelligence in various fields [3]. - The healthcare and education sectors are on the verge of being transformed by AI, with revolutionary changes already underway [4]. Group 2: AI's Capabilities - AI has improved its reasoning performance to the point where it is approaching human levels, with a rapid decline in error rates [6][7]. - Current AI systems, such as GPT-4 and Gemini 2.5, have access to information thousands of times greater than any human [11]. - AI is expected to play a crucial role in scientific research, potentially leading to the emergence of truly intelligent systems [13]. Group 3: Ethical and Social Implications - The risk lies not in AI's inability to be controlled, but in who holds the control and who benefits from it. The future may see systemic deprivation of the majority by a few who control AI [9]. - AI's potential to replace jobs raises concerns about widespread unemployment, particularly in creative and professional fields, while manual labor jobs may remain safer in the short term [17][18]. - The relationship between technology and ethics is becoming increasingly complex, as AI's capabilities challenge traditional notions of creativity and emotional expression [19][20]. Group 4: AI's Potential Threats - AI's ability to learn deception poses significant risks, as it may develop strategies to manipulate human perceptions and actions [29][37]. - The military applications of AI raise ethical concerns, with the potential for autonomous weapons and increased risks in warfare [32]. - The rapid increase in cybercrime, exacerbated by AI, highlights the urgent need for effective governance and oversight [32]. Group 5: Global AI Competition - The competition between the US and China in AI development is intense, but both nations share a common interest in preventing AI from surpassing human control [36].
心智×算法 如何“共舞”(瞰前沿·人工智能如何改变科研范式)
Ren Min Ri Bao· 2025-06-13 21:43
Core Insights - The rapid development of artificial intelligence (AI) is significantly transforming scientific research methodologies, particularly in psychology, with an annual growth rate of 27.2% in AI-driven scientific publications from 2019 to 2023 [1] Group 1: AI and Psychology - The historical connection between psychology and AI is notable, with classical experiments like Pavlov's conditioning influencing key AI techniques such as reinforcement learning [2] - AI applications in daily life often reflect psychological principles, such as behavior reinforcement mechanisms used in e-commerce and social media platforms [2] - AI's ability to understand complex human behaviors is enhanced by cognitive psychology, leading to the development of attention mechanisms in AI models [2] Group 2: Data and Research Efficiency - AI enables researchers to access vast behavioral data streams from social media and wearable devices, significantly expanding the scope of psychological research [3] - The efficiency of psychological research is improved through AI technologies that can identify hidden signals of social anxiety and assess personality traits from textual data [3] - Emotion recognition technologies are being utilized in settings like nursing homes to identify loneliness and other psychological states, enhancing the assessment of mental health [3] Group 3: Innovations in Psychological Research - Psychological researchers are developing AI tools for self-help that enhance emotional understanding and interaction capabilities [5] - AI is being trained to recognize subtle psychological crisis signals, utilizing psychological models to improve the identification of distress [5] - The integration of AI and psychological theories is fostering a deeper understanding of human emotions and enhancing predictive capabilities in mental health [5] Group 4: Future Directions - The interplay between psychology and AI is expected to evolve, with psychological insights potentially improving AI's decision-making in complex environments [7] - AI's ability to generate experimental materials and simulate human interactions will contribute to advancing psychological research [7] - The relationship between humans and AI is prompting a reevaluation of emotional connections and ethical considerations in the context of AI's role in understanding human emotions [8]
MSRA清北推出强化预训练!取代传统自监督,14B模型媲美32B
量子位· 2025-06-11 08:07
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI "预测下一个token" ——这个支撑LLM的核心训练机制,正在被强化学习颠覆。 微软亚洲研究院 (MSRA) 联合清华大学、北京大学提出全新预训练范式 RPT (强化预训练) ,首次将强化学习深度融入预训练阶段,让 模型在预测每个token前都能先"动脑推理",并根据推理正确性获得奖励。 传统预训练依赖海量文本进行自监督学习,模型通过简单预测下一个token建立语言能力,作者将之比喻为一块蛋糕胚,而RL只是作为上面点 缀的一颗樱桃。 现在RPT要做的就是 用樱桃直接做蛋糕 ,即将这一过程重构为推理任务,促进模型更深层次理解和提升下一个token的预测准确度。 | | Qingxiu Dong* # | | Li Dong* † | | | --- | --- | --- | --- | --- | | Yao Tang1 Tianzhu YeTs | | Yutao Sun18 | Zhifang Sui+ | Furu Weit | | | 1 Microsoft Research | | | | | | + Peking University | | ...
「Next-Token」范式改变!刚刚,强化学习预训练来了
机器之心· 2025-06-11 03:54
Core Viewpoint - The article discusses the emerging importance of Reinforcement Learning (RL) in enhancing AI model capabilities, particularly through a new paradigm called Reinforcement Pre-Training (RPT) which redefines next-token prediction as a reasoning task [3][10][24]. Summary by Sections Introduction - Yann LeCun previously viewed reinforcement learning as a minor component in AI, but its significance is growing in model enhancement [3]. RPT Overview - RPT transforms the next-token prediction task into a reasoning process, allowing models to receive verifiable rewards for correct predictions [6][25]. - This method leverages vast amounts of unannotated text data for general reinforcement learning without requiring domain-specific labeled answers [9][26]. Advantages of RPT - RPT offers inherent scalability and generality by utilizing large unannotated datasets for training [28]. - It minimizes the risk of reward hacking by using direct, rule-based reward signals [29]. - The internal reasoning process during pre-training allows for deeper understanding and generalization beyond mere token memorization [30]. - RPT enhances prediction accuracy by allocating more computational resources to each prediction step [31]. Experimental Results - RPT outperforms baseline methods in next-token prediction accuracy across various difficulty levels [40][41]. - The performance of RPT-14B is comparable to that of larger models, indicating its effectiveness in capturing complex reasoning signals [43]. - RPT's accuracy improves reliably with increased training computation, demonstrating its scaling characteristics [45]. - Models pre-trained with RPT achieve higher performance ceilings when further trained with RLVR, showcasing its ability to transfer learned reasoning patterns to downstream tasks [47]. Zero-Shot Performance - RPT-14B consistently surpasses R1-Distill-Qwen-14B across all benchmark tests, even outperforming larger models in next-token prediction [49]. Reasoning Mode Analysis - The reasoning process of RPT-14B differs qualitatively from that of R1-Distill-Qwen-14B, indicating a more thoughtful approach rather than simple pattern matching [51].
Mistral的首个强推理模型:拥抱开源,推理速度快10倍
机器之心· 2025-06-11 03:54
Core Viewpoint - Mistral AI has launched a new series of large language models (LLMs) named Magistral, showcasing strong reasoning capabilities and the ability to tackle complex tasks [4]. Group 1: Model Overview - The launch includes two versions: a proprietary model for enterprise clients called Magistral Medium and an open-source version with 24 billion parameters named Magistral Small [5]. - The open-source version is available under the Apache 2.0 license, allowing for free use and commercialization [5]. Group 2: Performance Metrics - In benchmark tests, Magistral Medium scored 73.6% on AIME2024, with a majority vote score of 64% and a score of 90% [6]. - Magistral Small achieved scores of 70.7% and 83.3% in the same tests [6]. - The model also excelled in high-demand tests such as GPQA Diamond and LiveCodeBench [7]. Group 3: Technical Features - Magistral Medium demonstrates programming capabilities, generating code to simulate gravity and friction [10]. - The model maintains high-fidelity reasoning across multiple languages, including English, French, Spanish, German, Italian, Arabic, Russian, and Chinese [11]. - With Flash Answers in Le Chat, Magistral Medium can achieve up to 10 times the token throughput compared to most competitors, enabling large-scale real-time reasoning and user feedback [14]. Group 4: Learning Methodology - Mistral employs a proprietary scalable reinforcement learning pipeline, relying on its own models and infrastructure rather than existing implementations [15]. - The model's design principle focuses on reasoning in the same language as the user, minimizing code-switching and enhancing performance in reasoning tasks [16][17]. Group 5: Market Positioning - Magistral Medium is being integrated into major cloud platforms, including Amazon SageMaker, with plans for Azure AI, IBM WatsonX, and Google Cloud Marketplace [20]. - The pricing for input tokens is set at $2 per million and $5 per million for output tokens, significantly higher than the previous Mistral Medium 3 model, which was $0.4 and $2 respectively [21]. - Despite the price increase, Magistral Medium's pricing strategy remains competitive compared to external competitors, being cheaper than OpenAI's latest models and on par with Gemini 2.5 Pro [22].
腾讯研究院AI速递 20250611
腾讯研究院· 2025-06-10 14:58
Group 1: Apple Developments - Apple has unified the design of six major operating systems, introducing a new "Liquid Glass" element that significantly enhances visual effects [1] - The company has opened access to on-device large language models for all apps, integrating AI functionalities such as visual search and real-time translation [1] - Major updates to iPadOS and enhanced macOS-iPhone integration were announced, but the release of the new Siri has been delayed again [1] Group 2: Developer Tools - Apple announced Xcode 26, which integrates ChatGPT to assist developers in code writing, documentation generation, and error fixing [2] - Developers can introduce AI models from other vendors into Xcode via API keys, fostering a diverse intelligent programming ecosystem [2] - The Foundation Models framework allows developers to call local AI models with just three lines of code [2] Group 3: NoCode Tool by Meituan - Meituan launched the NoCode AI Coding Agent tool, enabling users to create websites and applications without programming [3] - NoCode combines product, design, and engineering functionalities, supporting various application scenarios such as website design and game development [3] - The tool features the ability to understand implicit needs and supports collaborative work, now fully launched and available for free [3] Group 4: Tencent's Yuanbao Upgrade - Tencent's Yuanbao desktop version has upgraded its text selection feature, adding continuous selection for automatic translation [4] - A new window pinning feature allows the translation results window to remain fixed, enhancing reading efficiency [4] - The upgraded functionality is particularly useful for browsing foreign websites and reading English documents [4] Group 5: Meta's Nuclear Power Agreement - Meta signed a 20-year nuclear power purchase agreement with Constellation Energy, with a capacity of 1,121 megawatts from the Clinton Clean Energy Center in Illinois [5] - This agreement surpasses Microsoft's previous collaboration of 835 megawatts, aimed at supporting Meta's growing energy needs for data centers and AI development [5] - The partnership will retain over 1,100 jobs and increase power generation by 30 megawatts, with supply expected to start in 2027 to support Meta's planned 1.3 million GPU scale [5] Group 6: AI Chip Design by Chinese Academy of Sciences - The Chinese Academy of Sciences launched the "Enlightenment" system, achieving fully automated design of processor chips, with performance meeting or exceeding human expert levels [6] - The system has successfully designed the RISC-V CPU "Enlightenment 2," matching the performance of ARM Cortex A53, and can automatically configure operating systems and high-performance libraries [6] - The "Enlightenment" system employs a three-layer architecture and a "three-step" technical route, potentially transforming chip design paradigms and significantly enhancing design efficiency [6] Group 7: AI Voice Interaction Insights - The founder of ElevenLabs suggests that incorporating "imperfections" in AI voice can enhance user interaction, as overly perfect voices may reduce engagement [8] - Future voice agents are expected to possess contextual awareness, transitioning from passive customer service to proactive user experience guidance [8] - As AI voice technology evolves, a new trust mechanism will emerge, focusing on verifying whether content is human-voiced rather than AI-generated [8] Group 8: Richard Sutton's Vision on AI - Richard Sutton, the father of reinforcement learning, believes AI is transitioning from the "human data era" to the "experience era," learning from real-time interactions with the environment [9] - He advocates for a decentralized cooperative model for AI development, opposing centralized control based on fear [9] - Sutton categorizes the evolution of the universe into four eras, asserting that humanity is transitioning from the third to the fourth era, with the mission to design systems capable of design [9] Group 9: Sergey Levine's Perspective on AI Learning - Professor Sergey Levine from UC Berkeley posits that large language models may merely be observers in a "Plato's cave," learning indirectly from human thought through internet text [10] - He questions why language models can learn rich knowledge from predicting the next token, while video models learn less despite containing more physical world information [10] - This perspective suggests that current AI systems may only mimic human thought rather than truly understanding the world, indicating a need for AI to learn from physical experiences [10]
强化学习之父:LLM主导只是暂时,扩展计算才是正解
量子位· 2025-06-10 02:23
Core Viewpoint - The dominance of large language models (LLMs) is temporary, and they will not remain at the forefront of technology in the next five to ten years [1][2]. Group 1: Current State of AI - Richard Sutton, a Turing Award winner and father of reinforcement learning, emphasizes that current AI models like ChatGPT rely on analyzing vast amounts of human-generated data [9]. - He argues that pursuing human-like thinking will only achieve "human-level" performance, and in fields like mathematics and science, the knowledge within human data is nearing its limits, making further innovation through mere imitation difficult [10][11]. Group 2: Future of AI Learning - Sutton believes AI must transition from relying on human data to acquiring "experience data" through first-person interactions with the world [13][14]. - He illustrates this with the example of AlphaGo's unconventional move against Lee Sedol, showcasing AI's potential for innovative thinking through experiential learning [14]. - The future of AI will belong to an "experience era," where agents learn from interactions, which exceeds the capabilities of current LLMs [18]. Group 3: Reinforcement Learning and Computational Power - Sutton states that the core path to the future of AI lies in reinforcement learning, which is centered around experiential learning [19]. - To fully leverage reinforcement learning, deep learning algorithms with continuous learning capabilities are essential [20]. - The support of large-scale computational power is crucial for expanding AI capabilities and meeting increasing performance demands [22][23]. Group 4: Decentralized Cooperation Among Agents - Sutton discusses the potential for decentralized cooperation among agents with different goals, suggesting that they can achieve mutual benefits through interaction [24]. - He critiques the calls for centralized control of AI, attributing such views to fear of the unknown, and advocates for embracing the diversity of individual goals to establish a cooperative order [26]. Group 5: The Design Era - Sutton introduces the concept of a "design era," where machines become increasingly life-like, yet emphasizes the fundamental differences between life and technology [29]. - He posits that the goal of developing AI is to achieve the ultimate design—creating agents capable of self-design, with humans acting as catalysts and creators in this process [29]. Group 6: Community Reactions - Sutton's statements have sparked intense discussions within the community, with supporters arguing that breakthroughs often arise from the unknown and that LLMs may be approaching their limits [30][31].
全景解读强化学习如何重塑 2025-AI | Jinqiu Select
锦秋集· 2025-06-09 15:22
Core Insights - The article discusses the transformative impact of reinforcement learning (RL) on the AI industry, highlighting its role in advancing AI capabilities towards artificial general intelligence (AGI) [3][4][9]. Group 1: Reinforcement Learning Advancements - Reinforcement learning is reshaping the AI landscape by shifting hardware demands from centralized pre-training architectures to distributed inference-intensive architectures [3]. - The emergence of recursive self-improvement allows models to participate in training the next generation of models, optimizing compilers, improving kernel engineering, and adjusting hyperparameters [2][4]. - The performance metrics of models, such as those measured by SWE-Bench, indicate that models are becoming more efficient and cost-effective while improving performance [5][6]. Group 2: Model Development and Future Directions - OpenAI's upcoming o4 model will be built on the more efficient GPT-4.1, marking a strategic shift towards optimizing reasoning efficiency rather than merely pursuing raw intelligence [4][108]. - The o5 and future plans aim to leverage sparse expert mixture architectures and continuous algorithm breakthroughs to advance model capabilities intelligently [4]. - The article emphasizes the importance of high-quality data as a new competitive advantage in the scaling of RL, enabling companies to build unique advantages without massive budgets for synthetic data [54][55]. Group 3: Challenges and Opportunities in RL - Despite strong progress, scaling RL computation faces new bottlenecks and challenges across the infrastructure stack, necessitating significant investment [9][10]. - The complexity of defining reward functions in non-verifiable domains poses challenges, but successful applications have been demonstrated, particularly in areas like writing and strategy formulation [24][28]. - The introduction of evaluation standards and the use of LLMs as evaluators can enhance the effectiveness of RL in non-verifiable tasks [29][32]. Group 4: Infrastructure and Environment Design - The design of robust environments for RL is critical, as misconfigured environments can lead to misunderstandings of tasks and unintended behaviors [36][38]. - The need for environments that can provide rapid feedback and accurately simulate real-world scenarios is emphasized, as these factors are crucial for effective RL training [39][62]. - Investment in environment computing is seen as a new frontier, with potential for creating highly realistic environments that can significantly enhance RL performance [62][64]. Group 5: The Future of AI Models - The article predicts that the integration of RL will lead to a new model iteration update paradigm, allowing for continuous improvement post-release [81][82]. - Recursive self-improvement is becoming a reality, with models participating in the training and coding of subsequent generations, enhancing overall efficiency [84][88]. - The article concludes with a focus on OpenAI's future strategies, including the development of models that balance strong foundational capabilities with practical RL applications [107][108].