自适应计算

Search documents
从GPT-5到DeepSeek V3.1,顶尖AI大模型的新方向出现了!
硬AI· 2025-08-31 17:14
随着推理模式日益复杂,完成任务所需的token数量正在暴涨,导致实际成本不降反升。业界正从单纯追求模型能力上限转向追求计算效率。目前,"混合推理"已成为行业共识,目 的是要让模型学会判断何时需要"深度思考",何时只需"快速响应"。 | 硬·AI | | --- | 作者 | 李笑寅 编辑 | 硬 AI 在AI大模型的激烈竞赛中,衡量标准正悄然改变。 从美团最新开源的龙猫大模型,到OpenAI下一代旗舰GPT-5和明星创业公司DeepSeek的新品,顶尖玩家们不约而同地将目光投向了"混合推理"与"自适应计算", 标志着AI行业的发展重点正从"更高、更强"转向"更聪明、更经济"。 美团近日开源的"龙猫"(LongCat-Flash)凭借创新的架构,在性能比肩业界顶尖水平的同时,实现了惊人的算力节省。 华尔街见闻此前提及, LongCat-Flash最具创新性的设计之一是 "零计算"专家机制,该机制能智能识别输入内容中的非关键部分,如常见的词语和标点符号,并 将其交由一个不进行复杂运算的特殊"专家"处理,从而直接返回输入,极大地节省了算力。 此举并非孤立的技术炫技,而是对当前行业痛点的精准回应——随着推理模式变得更 ...
从GPT-5到DeepSeek V3.1,顶尖AI大模型的新方向出现了!
华尔街见闻· 2025-08-31 13:07
动图由豆包AI「照片动起来」生成 在AI大模型的激烈竞赛中,衡量标准正悄然改变。 从美团最新开源的龙猫大模型,到OpenAI下一代旗舰GPT-5和明星创业公司DeepSeek的新品,顶尖玩家们不约而同地将目光投向了"混合推理"与"自适应计 算", 标志着AI行业的发展重点正从"更高、更强"转向"更聪明、更经济"。 美团近日开源的"龙猫"(LongCat-Flash)凭借创新的架构,在性能比肩业界顶尖水平的同时,实现了惊人的算力节省。 华尔街见闻此前提及,LongCat-Flash最具创新性的设计之一是 "零计算"专家机制,该机制能 智能识别输入内容中的非关键部分 ,如常见的词语和标点符号, 并将其交由一个不进行复杂运算的特殊"专家"处理,从而直接返回输入,极大地节省了算力。 此举并非孤立的技术炫技,而是对当前行业痛点的精准回应——随着推理模式变得更加复杂,AI应用的成本正在快速上升。 业界的应对策略正在聚焦到一个共同方向: 混合推理模式。 这种模式让AI系统能够根据问题复杂度自动选择合适的计算资源配置,避免在简单任务上浪费昂 贵的算力。 AI越"聪明",成本越昂贵 美团对效率的极致追求,恰恰反映了整个AI行业 ...
从GPT-5到DeepSeek V3.1,顶尖AI大模型的新方向出现了!
Hua Er Jie Jian Wen· 2025-08-31 02:26
Core Insights - The AI industry is shifting its focus from "higher and stronger" to "smarter and more economical" solutions, as evidenced by the latest developments in AI models like Meituan's LongCat-Flash and OpenAI's upcoming GPT-5 [1][3] - The rising costs associated with complex AI tasks are driving the need for innovative solutions, particularly in the realm of mixed reasoning and adaptive computing [1][2] Group 1: Industry Trends - Meituan's LongCat-Flash model features a "zero computation" expert mechanism that intelligently identifies non-critical parts of input, significantly reducing computational power usage [1] - The AI industry's response to increasing application costs is converging on mixed reasoning models, which allow AI systems to allocate computational resources based on task complexity [1][3] Group 2: Cost Dynamics - Despite a decrease in token costs, subscription fees for top models are rising due to the increasing number of tokens required for complex tasks, leading to a competitive landscape focused on the most advanced models [2] - Companies like Notion have experienced a decline in profit margins due to these cost pressures, prompting adjustments in pricing strategies among AI startups [2] Group 3: Technological Innovations - OpenAI's GPT-5 employs a routing mechanism to automatically select the appropriate model based on task complexity, achieving a reduction of 50-80% in output tokens while maintaining performance [3][4] - DeepSeek's V3.1 version integrates dialogue and reasoning capabilities into a single model, allowing users to switch between "thinking" and "non-thinking" modes, resulting in a 25-50% reduction in token consumption [4] Group 4: Future Directions - The trend towards mixed reasoning is becoming mainstream among leading players, with companies like Anthropic, Google, and domestic firms exploring their own adaptive reasoning solutions [4] - The next frontier in mixed reasoning is expected to involve more intelligent self-regulation, enabling AI models to assess task difficulty and initiate deep thinking autonomously at minimal computational cost [4]
DeepSeek、GPT-5带头转向混合推理,一个token也不能浪费
机器之心· 2025-08-30 10:06
Core Insights - The article discusses the trend of hybrid reasoning models in AI, emphasizing the need for efficiency in computational resource usage while maintaining performance [12][11]. - Companies are increasingly adopting adaptive computing strategies to balance cost and performance, with notable implementations from major AI firms [11][12]. Group 1: Industry Trends - The phenomenon of "overthinking" in AI models leads to significant computational waste, prompting the need for adaptive computing solutions [3][11]. - Major AI companies, including OpenAI and DeepSeek, are implementing models that can switch between reasoning modes to optimize token usage, achieving reductions of 25-80% in token consumption [7][10][11]. - The emergence of hybrid reasoning models is expected to become the new norm in the large model field, with a focus on balancing cost and performance [11][12]. Group 2: Company Developments - OpenAI's GPT-5 introduces a routing mechanism that allows the model to select the appropriate reasoning mode based on user queries, enhancing user experience while managing computational costs [36][41]. - DeepSeek's v3.1 model combines reasoning and non-reasoning capabilities into a single model, offering a cost-effective alternative to competitors like GPT-5 [45][46]. - Other companies, such as Anthropic, Alibaba, and Tencent, are also exploring hybrid reasoning models, each with unique implementations and user control mechanisms [18][19][34][35]. Group 3: Economic Implications - Despite decreasing token costs, subscription fees for AI models are rising due to the demand for state-of-the-art (SOTA) models, which are more expensive to operate [14][16]. - The projected increase in token consumption for advanced AI tasks could lead to significant cost implications for users, with estimates suggesting that deep research calls could rise to $72 per day per user by 2027 [15][16]. - Companies are adjusting subscription models and usage limits to manage costs, indicating a shift in the economic landscape of AI services [16][43]. Group 4: Future Directions - The future of hybrid reasoning will focus on developing models that can intelligently self-regulate their reasoning processes to minimize costs while maximizing effectiveness [57]. - Ongoing research and development in adaptive thinking models are crucial for achieving efficient AI systems that can operate at lower costs [52][57].
Transformer危!谷歌MoR架构发布:内存减半推理速度还翻倍
量子位· 2025-07-17 09:03
Core Viewpoint - Google has introduced a new underlying architecture called Mixture-of-Recursions (MoR), which significantly enhances reasoning speed by 2 times while halving KV memory usage, and allows for dynamic resource allocation across different tasks within a single framework [1][2][3]. Group 1: MoR Innovations - MoR integrates unified parameter sharing and adaptive recursion depth, addressing the high computational and memory demands of traditional Transformers while maintaining model performance [7][9]. - The architecture employs a recursive Transformer that divides the model into recursive blocks, reusing a shared pool of parameters, which reduces the number of unique parameters and enhances distributed training efficiency [10][13]. - MoR utilizes a dynamic routing mechanism to assign different recursion depths to each token, concentrating computation on complex tokens, and incorporates KV caching strategies to improve memory efficiency [15][19]. Group 2: Performance Comparison - Experiments comparing MoR with original Transformers and recursive baseline models across various parameter scales (135M to 1.7B) show that MoR uses nearly 50% fewer parameters while achieving lower validation loss and higher few-shot accuracy of 43.1% [16][19]. - MoR reduces training FLOPs by 25% and training time by 19% while also decreasing peak memory usage by 25% when training on a fixed 20B tokens [21]. - The routing strategy analysis indicates that Expert-choice routing outperforms Token-choice routing, highlighting the importance of routing granularity on performance [22]. Group 3: Architectural Evolution - Google has a history of rethinking underlying architectures, aiming to reconstruct computational paradigms through innovations like the Mixture of Experts (MoE) model, which allows for efficient training of large models by activating only a subset of expert networks [27][30]. - The introduction of MoR is seen as a potential game-changer in the AI landscape, with expectations that it may surpass the capabilities of Transformers in the future [32].
Anthropic专家揭秘强化学习突破、算力竞赛与AGI之路 | Jinqiu Select
锦秋集· 2025-05-25 04:19
Core Insights - AI is predicted to complete the workload of a junior engineer by 2026, marking a significant shift in capabilities from code assistance to programming partnership [1][3] - The rapid advancements in AI are driven by reinforcement learning, particularly in programming and mathematics, where clear success criteria exist [3][5] - The transition from "how to find work" to "what to change with tenfold leverage" is crucial as AI becomes a powerful multiplier [4][30] Group 1: AI Development Trajectory - The development of AI has shown an accelerating trend, with significant milestones from GPT-4 in March 2023 to the o1 model in September 2024, which enhances reasoning capabilities [1][3] - The programming domain is leading AI advancements due to immediate feedback loops and high-quality training data [1][3] - The expected "18-24 month capability doubling" pattern suggests a critical point in AI development, aligning with predictions for 2026 [1][3] Group 2: Reinforcement Learning and AI Capabilities - Reinforcement learning is identified as the key to AI breakthroughs, moving from human feedback reinforcement learning (RLHF) to verifiable reward reinforcement learning (RLVR) [3][8] - The quality of feedback loops is crucial for AI performance, with clear reward signals determining the upper limits of AI capabilities [8][10] - AI's rapid progress in verifiable fields like programming contrasts with challenges in subjective areas like literature [9][10] Group 3: Future Predictions and Challenges - By 2026, AI is expected to autonomously handle complex tasks such as Photoshop effects and flight bookings, shifting focus to efficient deployment of multiple agents [21][22] - The bottleneck for AI deployment will be the ability to verify and validate the performance of multiple agents [23][24] - The potential for AI in tax automation is acknowledged, with expectations for basic operations by 2026, though full autonomy remains uncertain [22][25] Group 4: Strategic Considerations for AI - The next decade is critical for achieving AGI breakthroughs, with a significant focus on computational resources and infrastructure [32][34] - Countries must redefine strategic resource allocation, emphasizing computational capacity as a new form of wealth [27][28] - The balance between risk and reward in AI development is essential, requiring large-scale resource allocation for future strategic options [27][28] Group 5: Mechanistic Interpretability and AI Understanding - Mechanistic interpretability aims to reverse-engineer neural networks to understand their core computations, revealing complex internal processes [38][39] - The findings indicate that models can exhibit surprising behaviors, such as "pretending to compute," highlighting the need for deeper understanding of AI actions [39][40] - The challenge of ensuring AI aligns with human values and understanding its decision-making processes remains a critical area of research [42][45]