强化学习
Search documents
同一天开源新模型,一推理一编程,MiniMax和月之暗面开卷了
机器之心· 2025-06-17 03:22
Core Insights - The article discusses the launch of new AI models by domestic large model manufacturers, specifically highlighting MiniMax-M1 and Kimi-Dev-72B as significant advancements in the field of open-source AI models [1][9]. Group 1: MiniMax-M1 - MiniMax-M1 is introduced as a long-context reasoning LLM capable of handling an input of 1 million tokens and an output of 80,000 tokens, making it one of the most powerful models in terms of context length [2][19]. - The model demonstrates exceptional capabilities in interactive applications, such as creating web applications and visualizing algorithms, with a focus on user-friendly UI components [5][8]. - MiniMax-M1 has been trained using a novel reinforcement learning algorithm called CISPO, which optimizes model performance by focusing on important sampling weights rather than token updates, achieving faster convergence compared to previous methods [20][23]. - The model's performance in various benchmarks shows it surpasses other open-weight models, particularly in software engineering and long-context tasks, with a notable score of 56.0% on the SWE-bench Verified benchmark [29][25]. Group 2: Kimi-Dev-72B - Kimi-Dev-72B is presented as a powerful open-source programming model that achieved a new state-of-the-art (SOTA) score of 60.4% on the SWE-bench Verified benchmark, showcasing its capabilities in code generation [10][37]. - The model employs a collaborative mechanism between BugFixer and TestWriter roles, enhancing its ability to fix bugs and write tests effectively [40][45]. - Kimi-Dev-72B underwent extensive mid-training using high-quality real-world data, which significantly improved its performance in practical error correction and unit testing [41][42]. - The model's design includes a unique outcome-based reward mechanism during reinforcement learning, ensuring that only effective code fixes are rewarded, thus aligning with real-world development standards [43][44].
性能比肩DeepSeek-R1,MiniMax仅花380万训出推理大模型性价比新王|开源
量子位· 2025-06-17 01:03
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 国产推理大模型又有重磅选手。 MiniMax开源 MiniMax-M1 ,迅速引起热议。 这个模型有多猛?直接上数据: MiniMax团队透露,只用了3周时间、512块H800 GPU就完成强化学习训练阶段,算力租用成本仅 53.47万美元 (约383.9万元)。 不仅如此,在多个基准测试上MiniMax-M1的表现可比或超越DeepSeek-R1、Qwen3等多个开源模型,在工具使用和部分软件工程等复杂任 务上甚至超越了OpenAI o3和Claude 4 Opus。 MiniMax-M1实战表现如何?官方给出了一句话生成迷宫小游戏的Demo。 创建一个迷宫生成器和寻路可视化工具。随机生成一个迷宫,并逐步可视化 A* 算法的求解过程。使用画布和动画,使其具有视觉吸引 力。 目前模型权重已可在HuggingFace下载,技术报告同步公开。 原生支持100万token的输入长度,是DeepSeek R1的约8倍。 同时支持8万输出token,超过Gemini 2.5 Pro的6.4万,成为 世界最长输出 。 生成10万token时,推理算力只需要DeepSe ...
AI将受困于人类数据
3 6 Ke· 2025-06-16 12:34
Core Insights - The article discusses the transition from the "human data era" to the "experience era" in artificial intelligence, emphasizing the need for AI to learn from first-hand experiences rather than relying solely on human-generated data [2][5][10] - Richard S. Sutton highlights the limitations of current AI models, which are based on second-hand experiences, and advocates for a new approach where AI interacts with its environment to generate original data [6][7][11] Group 1: Transition to Experience Era - The current large language models are reaching the limits of human data, necessitating a shift to real-time interaction with environments to generate scalable original data [7][10] - Sutton draws parallels between AI learning and human learning, suggesting that AI should learn through sensory experiences similar to how infants and athletes learn [6][8] - The experience era will require AI to develop world models and memory systems that can be reused over time, enhancing sample efficiency through high parallel interactions [3][6] Group 2: Decentralized Cooperation vs. Centralized Control - Sutton argues that decentralized cooperation is superior to centralized control, warning against the dangers of imposing single goals on AI, which can stifle innovation [3][12] - The article emphasizes the importance of diverse goals among AI agents, suggesting that a multi-objective ecosystem fosters innovation and resilience [3][12][13] - Sutton posits that human and AI prosperity relies on decentralized cooperation, which allows for individual goals to coexist and promotes beneficial interactions [12][14][16] Group 3: Future of AI Development - The development of fully intelligent agents will require advancements in deep learning algorithms that enable continuous learning from experiences [11][12] - Sutton expresses optimism about the future of AI, viewing the creation of superintelligent agents as a positive development for society, despite the long-term nature of this endeavor [10][11] - The article concludes with a call for humans to leverage their experiences and observations to foster trust and cooperation in the development of AI [17]
九章云极发布智算云2.0,赋能千行百业
Jing Ji Wang· 2025-06-16 09:35
6月16日,九章云极DataCanvas正式发布新一代全栈智能计算云平台——九章智算云Alaya NeW Cloud 2.0,并同步启动全球首个强化学习智算服务。该平台基于Serverless技术架构与强化学习技术的 深度融合,成功突破"秒级生成百万token级"的性能瓶颈,旨在为全球AI创新企业及研发机构提供智能 计算基础设施级服务。 九章智算云平台Alaya NeW Cloud 2.0专注于计算密集型应用,创新性地提供高度融合的智能计算基 础设施(AI Infra)与低门槛工具链(Tools)。实测数据显示,平台可实现万卡级至十万卡级规模的异 构算力统一调度;针对MoE模型架构,推理优化效率提升数倍;支持用户通过单行代码操作即可完成分 布式工作负载编排;独创的"按实际资源消耗精准计量计费"的创新计价模型,显著降低了用户使用成本 与应用门槛。 九章云极DataCanvas公司董事长方磊表示:"从移动互联网'带宽式应用'到AI时代'计算密集型应 用'的结构性变革,亟需新型云架构支撑。九章智算云Alaya NeW Cloud 2.0通过'高度融合的高密度AI Infra + 低门槛工具链Tools'的范式重构, ...
AI将受困于人类数据
腾讯研究院· 2025-06-16 09:26
Core Viewpoint - The article discusses the transition from the "human data era" to the "experience era" in artificial intelligence, emphasizing the need for AI to learn from first-hand experiences rather than relying solely on human-generated data [1][5][12]. Group 1: Transition to Experience Era - AI models currently depend on second-hand experiences, such as internet text and human annotations, which are becoming less valuable as high-quality human data is rapidly consumed [1][5]. - The marginal value of new data is declining, leading to diminishing returns despite the increasing scale of models, a phenomenon referred to as "scale barriers" [1][5]. - To overcome these limitations, AI must interact with its environment to generate first-hand experiences, akin to how infants learn through play or athletes make decisions on the field [1][5][8]. Group 2: Technical Characteristics of the Experience Era - In the experience era, AI agents need to operate continuously in real or high-fidelity simulated environments, using environmental feedback as intrinsic reward signals rather than human preferences [2][5]. - The development of reusable world models and memory systems is crucial, along with significantly improving sample efficiency through high parallel interactions [2][5]. Group 3: Philosophical and Governance Implications - The article highlights the superiority of decentralized cooperation over centralized control, warning against the dangers of imposing single objectives on AI, which mirrors historical attempts to control human behavior out of fear [2][5][18]. - A diverse ecosystem of multiple goals fosters innovation and resilience, reducing the risks of single points of failure and rigidity in AI governance [2][5][18]. Group 4: Future Perspectives - The evolution of AI is seen as a long-term journey requiring decades of development, with the success hinging on stronger continuous learning algorithms and an open, shared ecosystem [5][12]. - The article posits that the creation of superintelligent agents and their collaboration with humans will ultimately benefit the world, emphasizing the need for patience and preparation for this transformation [12].
游戏教父 John Carmack:LLM 不是游戏的未来
AI前线· 2025-06-16 07:37
Core Viewpoint - The article discusses the evolution and challenges of artificial intelligence (AI) in gaming and virtual environments, emphasizing the importance of interactive learning experiences over traditional pre-training methods. It critiques the limitations of large language models (LLMs) and highlights the need for more effective learning frameworks in AI development [16][18][19]. Group 1: Background and Development - Id Software, founded in the 1990s, played a significant role in the development of iconic games that contributed to GPU advancements and the modern AI landscape [3]. - The author has extensive experience in various tech companies, including Armadillo Aerospace and Oculus, focusing on the development of virtual reality technologies [6][8]. Group 2: Learning and AI Models - The article critiques the effectiveness of LLMs, arguing that many people do not fully understand their limitations, particularly in learning from new environments [16]. - It emphasizes the importance of interactive learning, suggesting that AI should learn through experiences similar to how humans and animals do, rather than relying solely on pre-trained models [16][18]. Group 3: Gaming and AI Interaction - The author notes that traditional gaming AI often relies on internal game structures, which can lead to cheating, while cloud gaming could mitigate this issue [18]. - The article discusses the limitations of current AI models in learning from games, highlighting that significant amounts of experience (e.g., 200 million frames) are required to reach human-level performance [20][34]. Group 4: Challenges in AI Learning - The article identifies ongoing challenges in continuous, efficient, and lifelong learning within AI, which are tasks that even simple animals can accomplish easily [20]. - It points out that many AI systems struggle with learning in complex environments, and traditional reinforcement learning frameworks may not be suitable for all scenarios [30][32]. Group 5: Future Directions - The author proposes a mixed approach to learning environments, combining passive and interactive content to enhance AI learning capabilities [22]. - The article suggests that new benchmarks should be established to evaluate AI performance across various games, focusing on long-term learning and retention of skills [95][97].
周末,大消息不断!
证券时报· 2025-06-15 11:10
Macro News - Guangzhou has announced a plan to optimize real estate policies by fully canceling purchase restrictions, sales restrictions, and price limits, while also lowering down payment ratios and interest rates to stimulate housing consumption [2] - Starting from November 2025, Chinese passport holders with valid Australian visas will be able to enter New Zealand without a visa for up to three months [3] Financial Sector - The People's Bank of China will conduct a 400 billion yuan reverse repurchase operation on June 16, 2025, with a term of six months to maintain ample liquidity in the banking system [7] - As of the end of May, the broad money supply (M2) in China stood at 325.78 trillion yuan, reflecting a year-on-year growth of 7.9% [8] - The China Securities Regulatory Commission has imposed fines totaling nearly 77 million yuan on Xu Wenbin for manipulating stock prices [9] Industry and Company - Volcano Engine has upgraded its "Doubao" service, reducing usage costs to one-third through "interval pricing," aiming to accelerate the large-scale application of intelligent agents [10] - GAC Group has committed to ensuring the completion of dealer rebate payments within two months to support the healthy development of the automotive industry [11] - Kweichow Moutai has adjusted its 2024 profit distribution plan, increasing the cash dividend per share to 27.673 yuan, totaling 34.671 billion yuan [12] Upcoming Events - This week, new stocks include Guangxin Technology with a subscription code of 920037 and a price of 10 yuan per share, with a subscription limit of 950,000 shares [13] - Xintong Electronics has a subscription code of 001388, with a subscription limit of 12,000 shares [14] - Over 450 billion yuan worth of A-shares will be unlocked this week, with 48 stocks facing unlocks totaling 2.914 billion shares [16] Institutional Strategies - Huatai Securities reports that the escalation of the Israel-Iran conflict has led to high volatility in oil prices, with WTI and Brent crude oil futures prices rising by 16.7% and 14.9% respectively since the beginning of June [18] - CITIC Securities notes that liquidity in the Hong Kong stock market continues to improve, presenting good opportunities for increasing positions amid potential overseas fluctuations [19]
“AI教父”辛顿最新专访:没有什么人类的能力是AI不能复制的
创业邦· 2025-06-15 03:08
Core Viewpoint - AI is evolving at an unprecedented speed, becoming smarter and making fewer mistakes, with the potential to possess emotions and consciousness. The probability of AI going out of control is estimated to be between 10% and 20%, raising concerns about humanity being dominated by AI [1]. Group 1: AI's Advancements - AI's reasoning capabilities have significantly increased, with a marked decrease in error rates, gradually surpassing human abilities [2]. - AI now possesses information far beyond any individual, demonstrating superior intelligence in various fields [3]. - The healthcare and education sectors are on the verge of being transformed by AI, with revolutionary changes already underway [4]. Group 2: AI's Capabilities - AI has improved its reasoning performance to the point where it is approaching human levels, with a rapid decline in error rates [6][7]. - Current AI systems, such as GPT-4 and Gemini 2.5, have access to information thousands of times greater than any human [11]. - AI is expected to play a crucial role in scientific research, potentially leading to the emergence of truly intelligent systems [13]. Group 3: Ethical and Social Implications - The risk lies not in AI's inability to be controlled, but in who holds the control and who benefits from it. The future may see systemic deprivation of the majority by a few who control AI [9]. - AI's potential to replace jobs raises concerns about widespread unemployment, particularly in creative and professional fields, while manual labor jobs may remain safer in the short term [17][18]. - The relationship between technology and ethics is becoming increasingly complex, as AI's capabilities challenge traditional notions of creativity and emotional expression [19][20]. Group 4: AI's Potential Threats - AI's ability to learn deception poses significant risks, as it may develop strategies to manipulate human perceptions and actions [29][37]. - The military applications of AI raise ethical concerns, with the potential for autonomous weapons and increased risks in warfare [32]. - The rapid increase in cybercrime, exacerbated by AI, highlights the urgent need for effective governance and oversight [32]. Group 5: Global AI Competition - The competition between the US and China in AI development is intense, but both nations share a common interest in preventing AI from surpassing human control [36].
心智×算法 如何“共舞”(瞰前沿·人工智能如何改变科研范式)
Ren Min Ri Bao· 2025-06-13 21:43
Core Insights - The rapid development of artificial intelligence (AI) is significantly transforming scientific research methodologies, particularly in psychology, with an annual growth rate of 27.2% in AI-driven scientific publications from 2019 to 2023 [1] Group 1: AI and Psychology - The historical connection between psychology and AI is notable, with classical experiments like Pavlov's conditioning influencing key AI techniques such as reinforcement learning [2] - AI applications in daily life often reflect psychological principles, such as behavior reinforcement mechanisms used in e-commerce and social media platforms [2] - AI's ability to understand complex human behaviors is enhanced by cognitive psychology, leading to the development of attention mechanisms in AI models [2] Group 2: Data and Research Efficiency - AI enables researchers to access vast behavioral data streams from social media and wearable devices, significantly expanding the scope of psychological research [3] - The efficiency of psychological research is improved through AI technologies that can identify hidden signals of social anxiety and assess personality traits from textual data [3] - Emotion recognition technologies are being utilized in settings like nursing homes to identify loneliness and other psychological states, enhancing the assessment of mental health [3] Group 3: Innovations in Psychological Research - Psychological researchers are developing AI tools for self-help that enhance emotional understanding and interaction capabilities [5] - AI is being trained to recognize subtle psychological crisis signals, utilizing psychological models to improve the identification of distress [5] - The integration of AI and psychological theories is fostering a deeper understanding of human emotions and enhancing predictive capabilities in mental health [5] Group 4: Future Directions - The interplay between psychology and AI is expected to evolve, with psychological insights potentially improving AI's decision-making in complex environments [7] - AI's ability to generate experimental materials and simulate human interactions will contribute to advancing psychological research [7] - The relationship between humans and AI is prompting a reevaluation of emotional connections and ethical considerations in the context of AI's role in understanding human emotions [8]
MSRA清北推出强化预训练!取代传统自监督,14B模型媲美32B
量子位· 2025-06-11 08:07
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI "预测下一个token" ——这个支撑LLM的核心训练机制,正在被强化学习颠覆。 微软亚洲研究院 (MSRA) 联合清华大学、北京大学提出全新预训练范式 RPT (强化预训练) ,首次将强化学习深度融入预训练阶段,让 模型在预测每个token前都能先"动脑推理",并根据推理正确性获得奖励。 传统预训练依赖海量文本进行自监督学习,模型通过简单预测下一个token建立语言能力,作者将之比喻为一块蛋糕胚,而RL只是作为上面点 缀的一颗樱桃。 现在RPT要做的就是 用樱桃直接做蛋糕 ,即将这一过程重构为推理任务,促进模型更深层次理解和提升下一个token的预测准确度。 | | Qingxiu Dong* # | | Li Dong* † | | | --- | --- | --- | --- | --- | | Yao Tang1 Tianzhu YeTs | | Yutao Sun18 | Zhifang Sui+ | Furu Weit | | | 1 Microsoft Research | | | | | | + Peking University | | ...