持续学习
Search documents
斯坦福英伟达推出测试时强化学习:微调开源模型胜过顶级闭源模型,仅需几百美元
3 6 Ke· 2026-01-27 09:17
大模型持续学习,又有新进展! 来自斯坦福、英伟达等研究机构的最新研究,针对解决开放的科学问题,提出全新思路—— Test-Time Training to Discover (TTT-Discover)。 其基于开源模型gpt-oss-120b,在多个领域达到SOTA,优于人类专家与闭源前沿模型。 该方法不再沿用"测试时缩放"(Test-time Scaling)只通过Prompt调度冻结模型的做法。 而是在测试阶段,针对单个具体问题,引入强化学习(RL)对模型权重进行更新。 这种"测试时训练"使模型能够从该问题的失败尝试中实时获取经验,更新参数,实现模型能力的定向进化。 数学:给出了Erdős最小重叠问题的新界,并提出了一条自相关不等式 测试时进行强化学习 总的来说,这篇论文的核心思路是在测试时进行强化学习 (Reinforcement Learning at Test Time) ,并主要体现在两点: 1.学习目标(Learning Objective) 不同于传统强化学习侧重于提升所有任务的"平均奖励"以实现泛化,TTT-Discover采用熵目标函数(Entropic Objective)。 Kern ...
斯坦福英伟达推出测试时强化学习:微调开源模型胜过顶级闭源模型,仅需几百美元
量子位· 2026-01-27 02:33
Core Insights - The article discusses a new approach called Test-Time Training to Discover (TTT-Discover), which aims to solve open scientific problems by incorporating reinforcement learning during the testing phase of model evaluation [1][2]. Group 1: Methodology - TTT-Discover is based on the open-source model gpt-oss-120b and achieves state-of-the-art (SOTA) performance across multiple domains, outperforming human experts and closed-source models [3]. - Unlike traditional methods that rely on "Test-time Scaling" through prompt scheduling, TTT-Discover updates model weights during the testing phase to learn from specific problems [4][5]. - This "test-time training" allows the model to gain real-time experience from failed attempts, leading to a directed evolution of its capabilities [6]. Group 2: Learning Objectives - TTT-Discover employs an Entropic Objective, which focuses on maximizing the reward for the best actions rather than average rewards across all tasks, aiming for a single optimal solution instead of multiple mediocre ones [9][10][11]. - The method introduces a reuse mechanism inspired by PUCT, maintaining historical attempts in a buffer to prioritize the most promising states while balancing exploration [12]. Group 3: Implementation and Results - The model generates a "private dataset" through continuous action generation and feedback reception, addressing the out-of-distribution (OOD) problem by creating data specific to the problem at hand [13][14]. - TTT-Discover's approach contrasts with traditional test-time search methods, which do not update model weights and thus do not enhance the model's capabilities [15][16]. - The algorithm involves a cycle of selecting potential solutions, generating new attempts, and evaluating results, with the model's weights updated after each iteration to improve performance [17][18][27]. Group 4: Performance Metrics - In experimental settings, TTT-Discover demonstrated a speed improvement of approximately 2 times compared to the best human implementations in kernel engineering tasks [27]. - The testing cost for a single problem is estimated to be several hundred dollars, showcasing the efficiency of the approach [27]. Group 5: Future Directions - TTT-Discover is primarily applicable to continuous reward scenarios, with future work needed to extend its capabilities to sparse, binary, and unverifiable reward problems [29].
港股上市后,智谱继续推进A股IPO
2 1 Shi Ji Jing Ji Bao Dao· 2026-01-26 08:08
Core Viewpoint - The company, Zhiyuan, is progressing with its A-share IPO plan after successfully listing on the Hong Kong Stock Exchange on January 8, 2026, indicating a dual listing strategy in both A and H shares [2][4]. Group 1: IPO Progress and Plans - Zhiyuan's IPO counseling report was submitted by its counseling institution, China International Capital Corporation (CICC), indicating ongoing efforts to prepare for the A-share listing [2][5]. - The third phase of the IPO counseling is scheduled from October 1, 2025, to December 31, 2025, focusing on comprehensive due diligence and understanding the company's operational and financial status [4][8]. - The company initially planned to list on A-shares but adjusted its strategy to first enter the Hong Kong market before pursuing A-share listing [4][11]. Group 2: Financial Performance - Zhiyuan reported significant losses compared to its revenue, with net losses of 1.43 billion yuan in 2022, 7.88 billion yuan in 2023, and projected losses of 29.58 billion yuan in 2024 and 23.58 billion yuan in the first half of 2025 [13]. - Revenue figures for Zhiyuan were 57.4 million yuan in 2022, 124.5 million yuan in 2023, 312.4 million yuan in 2024, and 190.9 million yuan in the first half of 2025, indicating a growth trend despite ongoing losses [13]. - The company’s gross margin was reported at 54.6% in 2022, 64.6% in 2023, 56.3% in 2024, and 50.0% in the first half of 2025, reflecting fluctuations in profitability [13]. Group 3: Strategic Focus - Zhiyuan aims to enhance its revenue from the Model as a Service (MaaS) platform while maintaining its local deployment revenue base, indicating a strategy for scaling operations [13]. - The company plans to launch a new generation model, GLM-5, in 2026, focusing on advanced model architecture and learning paradigms [14]. - The strategic direction includes exploring online learning and continual learning to enhance model adaptability and evolution [14].
在OpenAI“创新已经变得困难”,离职高管深喉爆料
3 6 Ke· 2026-01-23 13:12
Group 1 - OpenAI is facing an innovation dilemma due to rising costs and growth pressures, which have affected its appetite for risk and hindered cross-team collaboration [3][8] - The rise of Google is attributed to OpenAI's failure to maintain its competitive edge, suggesting that OpenAI should have continued to lead the market [3][4] - The AI industry is experiencing a convergence among top companies, making it difficult for researchers to pursue innovative paths outside mainstream machine learning paradigms [3][4] Group 2 - The talent war in the AI sector has become dramatic, with frequent job changes among researchers, leading to less time spent on actual work [4][42] - Innovation is not solely driven by star researchers; the company's ability to foster a sense of personal responsibility and an environment that allows exploration is crucial [4][5] - The lack of focus, rather than a shortage of computing power, is identified as a key barrier to innovation within AI labs [5][19] Group 3 - The timeline for achieving Artificial General Intelligence (AGI) is projected around 2029, with critical areas of focus being architectural innovation and continuous learning [5][30] - Reinforcement learning is making a comeback, as historical patterns show that good ideas often resurface, but the challenge lies in determining the right timing for their importance [5][24] Group 4 - OpenAI's organizational structure is limiting its ability to support certain research directions, leading to a realization that some desired research cannot be pursued within the current framework [9][10] - The industry is witnessing a lack of diversity in approaches, with many companies following similar technological paths, which is seen as a regrettable trend [15][17] Group 5 - The current competitive landscape is characterized by a few major AI companies using similar technological foundations, resulting in minimal differentiation among their products [15][17] - The pressure to deliver results and maintain competitiveness is causing organizations to shy away from risk-taking, which is essential for genuine innovation [18][19] Group 6 - The significant resource barriers in AI research are hindering innovative attempts, as many promising ideas lack the necessary funding for large-scale experimentation [20][21] - The balance between exploration and exploitation is a critical issue in optimizing AI agents and should also be reflected in organizational decision-making [21][22] Group 7 - The importance of world models in AI training is emphasized, suggesting that integrating world understanding with reinforcement learning could lead to significant advancements [27][30] - Continuous learning and the integration of training and operational phases are identified as essential capabilities that are currently lacking in AI models [30][31] Group 8 - The rapid evolution of AI technology necessitates a cautious approach to its deployment, as the implications of new advancements can have far-reaching effects on society [37][38] - The ongoing discourse around AI technologies is marked by a mix of excitement and concern, highlighting the need for responsible discussions about their impact [40][41]
速递|“新实验室”浪潮汹涌:前OpenAI团队创立的Applied Compute,13亿美元估值融资在即
Z Potentials· 2026-01-21 05:52
Core Insights - Applied Compute, a startup founded by three former OpenAI researchers, is negotiating to raise new funding at a valuation of $1.3 billion, more than doubling its previous valuation of approximately $500 million from less than three months ago [1][2]. Group 1: Company Overview - Applied Compute focuses on helping businesses customize AI models using their own data, specifically targeting sectors like finance and law [3]. - The company employs reinforcement learning techniques to optimize model performance by rewarding desired behaviors and penalizing others [3]. - Applied Compute collaborates with clients such as DoorDash, Cognition, and Mercor to develop AI agents that can perform tasks on behalf of employees [3]. Group 2: Funding and Valuation - The current funding round may raise up to $70 million, with venture capital firm Kleiner Perkins reportedly leading the investment [2]. - The company has previously raised $80 million from investors including Sequoia Capital, Benchmark, and Lux Capital [2]. - The investment interest in startups focused on research, often referred to as "new labs," is growing, as these companies aim to advance model and product development in ways that larger labs like OpenAI and Anthropic may overlook [2]. Group 3: Financial Performance - As of November last year, Applied Compute achieved an annualized revenue of $12.8 million [5]. - The company is still in the early stages of revenue generation, indicating potential for growth as it scales its operations [5]. Group 4: Industry Context - Applied Compute is not the only well-funded startup aiming for customization; other companies like Thinking Machines Lab are also pursuing similar goals but face challenges such as employee turnover [4]. - The trend of researchers establishing their own companies continues, driven by the desire to innovate beyond the constraints of larger organizations [2].
Anthropic 一夜震撼升级:Claude 获得「永久记忆」!全球打工人变天
程序员的那些事· 2026-01-21 00:51
Core Viewpoint - Anthropic is set to revolutionize AI with the introduction of "permanent memory" in Claude Cowork, transforming it from a simple chatbot into a powerful AI collaborator capable of long-term memory and task execution [1][3][44]. Group 1: Permanent Memory and Knowledge Base - Claude Cowork will feature a new "knowledge base" that allows it to retain information over time, categorizing data instead of relying on a chaotic general memory [12][14][19]. - This knowledge base will enable Claude to actively retrieve relevant background information when responding to queries, enhancing its contextual understanding [16][20]. - Users will have the ability to manage multiple distinct knowledge bases, allowing for tailored interactions based on specific tasks [19][20]. Group 2: Integration of Cowork and Chat Modes - The Cowork mode will become the primary interface for Claude, integrating chat functionalities while emphasizing workflow and productivity [21][28]. - Traditional chat features will still be available but will be folded into the Cowork mode, which will serve as the default workspace for users [22][23]. Group 3: User Interface and Automation Enhancements - The user interface will undergo significant changes, including a dedicated Artefacts sidebar for managing and reusing outputs, moving away from transient interactions [29][30]. - Enhanced automation capabilities will be introduced through the MCP Registry, allowing Claude to dynamically manage remote connectors and improve task execution [33][39]. Group 4: Experience Layer Upgrades - New features such as a web voice mode and upgraded Pixelate functionality will enhance user experience, indicating a shift towards multi-modal and frequent usage [40][41][44]. - These updates suggest that Claude is evolving into a more interactive and capable AI partner, rather than just a question-and-answer tool [44]. Group 5: Industry Context and Future Implications - The introduction of permanent memory aligns with a broader industry trend towards continuous learning in AI, with expectations that 2026 will be a pivotal year for advancements in this area [47][51][65]. - The competition among AI assistants may enter a new phase as companies like Anthropic push the boundaries of what AI can achieve with persistent memory and task execution capabilities [45][56].
【全网无错版】上周末,唐杰、杨强、林俊旸、姚顺雨真正说了什么?
机器人圈· 2026-01-13 09:41
Core Viewpoint - The article discusses the vibrant developments in China's AI sector at the beginning of 2026, highlighting key figures in the field and their contributions to the evolution of large models and AI applications. Group 1: Event Highlights - The event featured prominent figures in AI, including Professor Tang Jie, Yang Zhilin, Lin Junyang, and Yao Shunyu, marking a significant gathering in Beijing [1]. - The presence of foundational figures like Zhang Bo and Yang Qiang indicates the event's importance in shaping the future of the large model industry [1]. Group 2: Observations on AI Development - The year 2025 was noted as a breakthrough year for open-source models in China, with a 10 to 20 times increase in coding activities [6]. - The discussion emphasized the differentiation of AI models, with a focus on enterprise applications and coding, inspired by developments in Silicon Valley [7][8]. Group 3: Model Differentiation - Yao Shunyu pointed out the clear division between To C (consumer) and To B (business) models, with a growing trend towards vertical integration and layered applications [9][12]. - The article highlights that while consumer applications may not require the highest intelligence, business applications benefit significantly from stronger models, leading to a willingness to pay for superior performance [10][11]. Group 4: Future Paradigms in AI - The conversation shifted to the next paradigm in AI, focusing on autonomous learning and self-improvement, with various interpretations of what this entails [23][24]. - Yao Shunyu mentioned that the bottleneck for autonomous learning is not methodology but rather the data and tasks involved, indicating a need for context and environment to enhance AI capabilities [23][25]. Group 5: Agent Strategy - The potential for agents to automate human tasks significantly was discussed, with expectations that by 2026, agents could handle workloads equivalent to one or two weeks of human effort [39][40]. - The article suggests that the development of agents is closely tied to advancements in model capabilities and the complexity of interaction environments [45][46].
美国AI一骑绝尘,中国平均落后7个月,Epoch AI新报告出炉
3 6 Ke· 2026-01-08 07:53
Core Insights - The report from Epoch AI indicates that Chinese AI models are, on average, 7 months behind their American counterparts, with a minimum gap of 4 months and a maximum of 14 months [1][4]. Group 1: AI Development Comparison - The average 7-month lag is attributed to the differences between open-source and closed-source models, with the gap closely aligning with the overall performance disparity between these two categories [2]. - The comprehensive capability index (ECI) used in the report evaluates language understanding, reasoning, and multi-task performance, quantifying the time needed for Chinese AI to reach parity with U.S. capabilities [4]. - The progress of U.S. AI is characterized by a rapid update cycle, with significant advancements occurring in quick succession, unlike the more sporadic improvements seen in Chinese AI models [6][9]. Group 2: Trends in AI Model Development - Chinese AI models are primarily advancing by increasing parameter sizes and utilizing Mixture of Experts (MoE) architectures, as seen in models like Baichuan2 and Qwen-14B [8]. - The gap between Chinese and American AI has been narrowing, with projections indicating a reduction from 10-12 months in 2023 to a stable 7 months by 2025, reflecting consistent progress in China [9]. - The trend of open-sourcing in Chinese AI models contrasts with the closed-source approach of leading U.S. models, which may be a limiting factor for China's advancements [10][11]. Group 3: Future Directions - The next significant leap in AI capabilities is expected to revolve around integrating reasoning and action, enabling self-reflection and planning within AI systems [15]. - The ability to allow AI to self-learn and evolve without retraining is anticipated to be a core competency for the next generation of AI [16]. - The race to achieve these advancements will likely redefine the leading edge of AI technology, with the first entity to cross this threshold gaining a significant competitive advantage [17].
IPO首日,智谱创立发起人内部信曝光:明确2026年目标,提及梁文锋
Xin Lang Cai Jing· 2026-01-08 02:37
Core Insights - The core message of the news is that Zhipu AI has officially launched and is set to introduce its next-generation model, GLM-5, with a vision to become a leading global player in large models by 2026 [1][2]. Group 1: Company Vision and Goals - Zhipu AI aims to become an international leader in large models by 2026, as stated by its founder and chief scientist, Tang Jie [1][2]. - The company is focusing on the persistent pursuit of AGI technology and the exploration of its upper limits, which are seen as critical for future improvements [3]. Group 2: Upcoming Developments - The GLM-5 model is expected to be released soon, featuring significant scaling and new technological improvements to enhance user experience and task completion [1][3]. - The company plans to explore new model architectures to address the limitations of the widely used Transformer architecture, which has shown inefficiencies in handling long contexts and memory mechanisms [2][3]. Group 3: Research and Development Focus - There is a need to develop a more generalized Reinforcement Learning (RL) paradigm that can handle long-term tasks beyond the current capabilities of RLVR, which relies on manually constructed environments [4]. - The company is also focusing on continuous learning and autonomous evolution, moving away from static AI models that become outdated post-deployment, aiming for a paradigm that allows for ongoing learning from interactions with the world [5].
Sebastian Raschka万字年终复盘:2025,属于「推理模型」的一年
机器之心· 2026-01-02 09:30
Core Insights - The AI field continues to evolve rapidly, with significant advancements in reasoning models and algorithms such as RLVR and GRPO, marking 2025 as a pivotal year for large language models (LLMs) [1][4][19] - DeepSeek R1's introduction has shifted the focus from merely stacking parameters to enhancing reasoning capabilities, demonstrating that high-performance models can be developed at a fraction of previously estimated costs [9][10][12] - The importance of collaboration between humans and AI is emphasized, reflecting on the boundaries of this partnership and the evolving role of AI in various tasks [1][4][66] Group 1: Reasoning Models and Algorithms - The year 2025 has been characterized as a "year of reasoning," with RLVR and GRPO algorithms gaining prominence in the development of LLMs [5][19] - DeepSeek R1's release showcased that reasoning behavior can be developed through reinforcement learning, enhancing the accuracy of model outputs [6][19] - The estimated training cost for the DeepSeek R1 model is significantly lower than previous assumptions, around $5.576 million, indicating a shift in cost expectations for advanced model training [10][12] Group 2: Focus Areas in LLM Development - Key focus areas for LLM development have evolved over the years, with 2025 emphasizing RLVR and GRPO, following previous years' focus on RLHF and LoRA techniques [20][22][24] - The trend of "Benchmaxxing" has emerged, highlighting the overemphasis on benchmark scores rather than real-world applicability of LLMs [60][63] - The integration of tools in LLM training has improved performance, allowing models to access external information and reduce hallucination rates [54][56] Group 3: Architectural Trends - The architecture of LLMs is converging towards using mixture of experts (MoE) layers and efficient attention mechanisms, indicating a shift towards more scalable and efficient models [43][53] - Despite advancements, traditional transformer architectures remain prevalent, with ongoing improvements in efficiency and engineering adjustments [43][53] Group 4: Future Directions - Future developments are expected to focus on expanding RLVR applications beyond mathematics and coding, incorporating reasoning evaluation into training signals [25][27] - Continuous learning is anticipated to gain traction, addressing challenges such as catastrophic forgetting while enhancing model adaptability [31][32] - The need for domain-specific data is highlighted as a critical factor for LLMs to establish a foothold in various industries, with proprietary data being a significant concern for companies [85][88]