Workflow
Scaling Law
icon
Search documents
倒反天罡,Gemini Flash表现超越Pro,“帕累托前沿已经反转了”
3 6 Ke· 2025-12-22 10:12
Core Insights - Gemini 3 Flash has outperformed its predecessor Gemini 2.5 Pro and even the flagship Gemini 3 Pro in various performance metrics, achieving a score of 78% in the SWE-Bench Verified test, surpassing the Pro's score of 76.2% [1][5][6] - The Flash version demonstrates significant improvements in programming capabilities and multimodal reasoning, with a score of 99.7% in the AIME 2025 mathematics benchmark when code execution is included [5][6] - Flash's performance in the challenging Humanity's Last Exam test is competitive, scoring 33.7% without tools, closely trailing the Pro's 37.5% [5][6] Performance Metrics - In the SWE-Bench Verified test, Gemini 3 Flash scored 78%, while Gemini 3 Pro scored 76.2% [5][6] - In the AIME 2025 mathematics benchmark, Flash scored 99.7% with code execution, while Pro scored 100% [6] - Flash achieved 33.7% in the Humanity's Last Exam, compared to Pro's 37.5% [5][6] Cost and Efficiency - Gemini 3 Flash has a competitive pricing structure, with input costs at $0.50 per million tokens and output costs at $3.00 per million tokens, which is higher than Gemini 2.5 Flash but justified by its performance [7] - Flash's inference speed is three times that of Gemini 2.5 Pro, with a 30% reduction in token consumption [7] Strategic Insights - Google’s core team views the Pro model as a means to distill the capabilities of Flash, emphasizing that Flash's smaller size and efficiency are crucial for users [11][12] - The development team believes that the traditional scaling law is evolving, with a shift from merely increasing parameters to enhancing inference capabilities [12][14] - The emergence of Flash has sparked discussions about the validity of the "parameter supremacy" theory, suggesting that smaller, more efficient models can outperform larger ones [13][14]
信仰与突围:2026人工智能趋势前瞻
3 6 Ke· 2025-12-22 09:32
谁也无法想到,ChatGPT迎来三周年之际,没有庆祝和纪念,反而是内部发布的一封红色警报,再次敲响了人工智能竞争白热化的战鼓。在受到Gemini 3 惊艳效果的威胁下,Open AI加速推出了GPT 5.2,用更多的资源,在多项指标上实现了反超。但三年下来,各大模型之间的性能差距和范式差异持续缩 小,业界出现不少质疑的声音,认为大模型发展正面临天花板。但也有很多人坚定看好AGI的到来,产业充满了更多的争论和分化。 站在2025的年尾,回顾来时之路,从DeepSeek的火热,到GPT4o 后吉卜力动画的流行,Sora2的与山姆奥特曼同框,再到谷歌Nano Banana生图的各种机器 猫讲解。有时似乎有恍如隔世之感,一项今年的技术,仿佛已是多年前的流行。 展望2026,我们不仅感受到对大模型智能瓶颈和投资回报不确定性的焦虑,看到更多的非共识,也看到大家的坚守和信仰,以及有望在多个方向的突围, 更多的期待和探索正在扑面而来。 信仰 1.Scalling Law驱动向AGI持续进化 自 ChatGPT 横空出世以来,业界主流都相信只要不断增加算力、扩充数据、堆叠参数,机器的智能就会像物理定律一样增长,直至触达 AGI ...
信仰与突围:2026人工智能趋势前瞻
腾讯研究院· 2025-12-22 08:33
信仰 1.Scalling Law驱动 向AGI持续进化 王齐昂 独立科技观察者 谁也无法想到,ChatGPT迎来三周年之际,没有庆祝和纪念,反而是内部发布的一封红色警报,再次敲 响了人工智能竞争白热化的战鼓。在受到Gemini 3惊艳效果的威胁下,Open AI加速推出了GPT 5.2,用 更多的资源,在多项指标上实现了反超。但三年下来,各大模型之间的性能差距和范式差异持续缩小, 业界出现不少质疑的声音,认为大模型发展正面临天花板。但也有很多人坚定看好AGI的到来,产业充 满了更多的争论和分化。 站在2025的年尾,回顾来时之路,从DeepSeek的火热,到GPT4o 后吉卜力动画的流行,Sora2的与山姆 奥特曼同框,再到谷歌Nano Banana生图的各种机器猫讲解。 有时似乎有恍如隔世之感,一项今年的技 术,仿佛已是多年前的流行。 展望2026,我们不仅感受到对大模型智能瓶颈和投资回报不确定性的焦虑,看到更多的非共识,也看到 大家的坚守和信仰,以及有望在多个方向的突围,更多的期待和探索正在扑面而来。 自 ChatGPT 横空出世以来,业界主流都相信只要不断增加算力、扩充数据、堆叠参数,机器的智能就 会 ...
倒反天罡!Gemini Flash表现超越Pro,“帕累托前沿已经反转了”
量子位· 2025-12-22 08:01
Core Insights - Gemini 3 Flash outperforms its predecessor Gemini 2.5 Pro and even the flagship Gemini 3 Pro in various benchmarks, achieving a score of 78% in the SWE-Bench Verified test, surpassing Gemini 3 Pro's score of 76.2% [1][6][9] - The performance of Gemini 3 Flash in the AIME 2025 mathematics competition benchmark is notable, scoring 99.7% with code execution capabilities, indicating its advanced mathematical reasoning skills [7][8] - The article emphasizes a shift in perception regarding flagship models, suggesting that smaller, optimized models like Flash can outperform larger models, challenging the traditional belief that larger models are inherently better [19][20] Benchmark Performance - In the Humanity's Last Exam, Flash scored 33.7% without tools, closely trailing Pro's 37.5% [7][8] - Flash's performance in various benchmarks includes: - 90.4% in GPQA Diamond for scientific knowledge [8] - 95.2% in AIME 2025 for mathematics without tools [8] - 81.2% in MMMU-Pro for multimodal understanding [8] - Flash's speed is three times that of Gemini 2.5 Pro, with a 30% reduction in token consumption, making it cost-effective at $0.50 per million tokens for input and $3.00 for output [9] Strategic Insights - Google’s team indicates that the Pro model's role is to "distill" the capabilities of Flash, focusing on optimizing performance and cost [10][12][13] - The evolution of scaling laws is discussed, with a shift from merely increasing parameters to enhancing reasoning capabilities through advanced training techniques [15][16] - The article highlights the importance of post-training as a significant area for future development, suggesting that there is still substantial room for improvement in open-ended tasks [17][18] Paradigm Shift - The emergence of Flash has sparked discussions about the validity of the "parameter supremacy" theory, as it demonstrates that smaller, more efficient models can achieve superior performance [19][21] - The integration of advanced reinforcement learning techniques in Flash is cited as a key factor in its success, proving that increasing model size is not the only path to enhancing capabilities [20][22] - The article concludes with a call to reconsider the blind admiration for flagship models, advocating for a more nuanced understanding of model performance [23]
MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law
量子位· 2025-12-22 04:41
一水 发自 凹非寺 量子位 | 公众号 QbitAI MiniMax海螺视频团队不藏了! 首次开源 就揭晓了一个困扰行业已久的问题的答案—— 为什么往第一阶段的视觉分词器里砸再多算力,也无法提升第二阶段的生成效果? 翻译成大白话就是,虽然图像/视频生成模型的参数越做越大、算力越堆越猛,但用户实际体验下来总有一种微妙的感受——这些庞大的投入 与产出似乎不成正比,模型离完全真正可用总是差一段距离。 So why?问题,大概率就出在 视觉分词器(Tokenizer) 这个东西身上了。 当算力不再是答案时,真正需要被重新审视的,其实是生成模型的"起点"。 在当前主流的两阶段生成框架中 (分词器+生成模型) ,业界已经在视觉分词器的预训练上投入了大量算力与数据,但一个尴尬的事实是: 这些成本,几乎没有线性地转化为生成质量的提升 。 而MiniMax海螺视频团队,不止挑战了这一现实——用实验证明"Tokenizer的scaling能够提升模型性能"。 更关键的是,还带来了一款 开箱即用、专为"下一代生成模型"打造的可扩展视觉分词器预训练框架——Visual Tokenizer Pre-training (以下简称VTP) ...
Scaling Law没死,Gemini核心大佬爆料,谷歌已有颠覆性密钥
3 6 Ke· 2025-12-22 01:05
谷歌又要有重大突破了? 最近,Google DeepMind的Gemini预训练负责人Sebastian Borgeaud在采访中给出重磅爆料—— Google DeepMind的Gemini预训练负责人Sebastian Borgeaud在最近的访谈中表示,预计在未来一年内,针对提升长上下文处理效率以及进一步扩展模型上 下文长度的预训练技术,将会有重大创新。 未来一年,大模型预训练领域将在「长上下文处理效率」和「上下文长度扩展」两大方向迎来重大技术创新。 同时,Google Gemini三巨头——Jeff Dean、OriolVinyalsML和Noam Shazeer罕见同台了,他们的对谈中,跟Sebastian的内容展现出了惊人的一致。 众多高瞻远瞩、闪烁着智慧光芒的思想让人深思。 难怪,谷歌依然是那个巨人。 谷歌大佬激动预言 已破解大模型核心秘密 另外他还透露说,最近他们在注意力机制方面取得了一些非常有趣的发现,这可能在未来几个月内重塑他们的研究方向。 对此,他表示非常兴奋。 而且他提出了振聋发聩的一句话:Scaling Law并未消亡,只是正在演变! Sebastian Borgeaud是Gemin ...
自变量王潜:具身智能是物理世界的独立基础模型|MEET2026
量子位· 2025-12-21 05:45
Core Viewpoint - The embodiment intelligence model is considered an independent foundational model parallel to language and multimodal models, specifically designed for the physical world [6][12][61] Group 1: Differences Between Physical and Virtual Worlds - The fundamental differences between the physical and virtual worlds are recognized, with the physical world characterized by continuity, randomness, and processes related to force, contact, and timing [2][10] - Existing models based on language and visual paradigms are structurally misaligned with the complexities of the physical world [3][21] Group 2: Need for a Separate Foundational Model - A separate foundational model is necessary due to the significant randomness in the physical world, which existing models struggle to accurately represent [10][17] - The current reliance on multimodal models for embodiment intelligence is seen as inadequate, necessitating a complete rethinking of model architecture and training methods [9][21] Group 3: Future of Multimodal Models - Shifting perspectives on embodiment intelligence will lead to new insights in model architecture and data utilization [24][30] - The learning processes in the physical world differ fundamentally from those in the virtual world, suggesting that future multimodal models must adapt to these differences [25][28] Group 4: Scaling Laws and Data Utilization - The concept of Scaling Law is crucial in the development of large models, particularly in robotics, where data sourcing remains a significant challenge [47][49] - A phased approach to training and data collection is recommended, emphasizing the importance of real-world data for effective learning [52][53] Group 5: Hardware and AI Integration - A new learning paradigm necessitates the redesign of hardware in the physical world, advocating for AI to define hardware rather than the other way around [54][55] - The potential for embodiment intelligence to drive exponential growth in resources and capabilities is highlighted, drawing parallels to historical industrial advancements [60][61]
清华孙茂松:对工业界而言,大厂可以Scaling,其他玩家重在垂直应用 | MEET2026
量子位· 2025-12-21 02:00
编辑部 整理自 MEET2026 量子位 | 公众号 QbitAI 涌现,AI沙场如今兵家争锋所期待出现的「境界」。 自从Scaling Law为模型带来惊人的能力增长后,几乎所有模型厂商都被卷入了一场无止境的FOMO,没人敢停下来。 我觉得大模型最有魅力的地方,在于它是非线性变化,代表着极大的不确定性,但一旦出现性能涌现就将远超想象。 在量子位MEET2026智能未来大会上,清华大学人工智能研究院常务副院长,欧洲科学院外籍院士 孙茂松 如此感慨。 只要算力还能堆、参数还能涨,就不能停止烧钱。 然而,在Scaling的边际成本越来越高的背景下, 万一最后发现这是条死胡同,投入全打水漂了怎么办? 孙茂松的建议是,可以「致广大」,但更要「尽精微」。 就企业界而言,少数实力极其雄厚的团队,可以尝试在「致广大」方向上继续跟随国际前沿;但绝大多数AI公司,都应该把主要精力放在「尽 精微」上。 为了完整呈现孙茂松的思考,在不改变原意的基础上,量子位对演讲内容进行了整理编辑,希望能提供新的视角与洞察。 MEET2026智能未来大会是由量子位主办的行业峰会,近30位产业代表与会讨论。线下参会观众近1500人,线上直播观众35 ...
刘煜辉:当AI Scaling撞上天花板,谁在真正兑现技术红利?
Xin Lang Cai Jing· 2025-12-18 09:31
炒股就看金麒麟分析师研报,权威,专业,及时,全面,助您挖掘潜力主题机会! 来源:刘煜辉的高维宏观 中国资本市场应当承担起为"东大治权时代"进行全球资产定价的新使命——这意味着我们要逐步淡出 对"西大"估值体系的被动映射,转而建立一套自己的独立资产定价体系。全球产业格局正在发生根本性 位移:过去由西大主导的技术叙事和金融定价,越来越难以反映我们在工业制造、系统集成上的压倒性 优势。 东大在落地能力和完整产业生态上的优势无以伦比。在AI领域,全球绝大多数端侧设备(手机、PC等 等)的硬件制造和供应链整合集中于中国;在新能源车领域,从电池材料、电芯到整车,中国已形成闭 环产能,占据全球60%以上份额; 在光伏、风电、特高压电网等绿色能源基础设施上,东大也输出全球;从新能源到废塑化学循环,都是 刨西大王朝能源基的祖坟,挑战传统石化能源路径。这就是东大的超级工业Power,它的"超级"在于不 依赖于资源的能源产能、电网等等。 这些凝聚着工匠精神和大国重器的产业,未来理应享有全球资产溢价。反观西大,其定位已经越来越接 近于一个纯粹的技术蓝图输出者。而支撑其AI叙事的Scaling Law("模型性能随算力、数据和参数规模 ...
AGI为什么不会到来?这位研究员把AI的“物理极限”讲透了
3 6 Ke· 2025-12-17 11:43
Group 1 - The article discusses the skepticism surrounding the realization of Artificial General Intelligence (AGI), emphasizing that current optimism in the market may be misplaced due to physical constraints on computation [1][4]. - Tim Dettmers argues that computation is fundamentally bound by physical laws, meaning that advancements in intelligence are limited by energy, bandwidth, storage, manufacturing, and cost [3][4]. - Dettmers identifies several key judgments regarding AGI: the success of Transformer models is not coincidental but rather an optimal engineering choice under current physical constraints, and further improvements yield diminishing returns [4][6]. Group 2 - The article highlights that discussions about AGI often overlook the physical realities of computation, leading to misconceptions about the potential for unlimited scaling of intelligence [5][9]. - It is noted that as systems mature, linear improvements require exponentially increasing resource investments, which can lead to diminishing returns [10][16]. - The article points out that the performance gains from GPUs, which have historically driven AI advancements, are nearing their physical and engineering limits, suggesting a shift in focus is necessary [18][22]. Group 3 - Dettmers suggests that the current trajectory of AI development may be approaching a stagnation point, particularly with the introduction of Gemini 3, which could signal a limit to the effectiveness of scaling [33][36]. - The cost structure of scaling has changed, with past linear costs now becoming exponential, indicating that further scaling may not be sustainable without new breakthroughs [35][36]. - The article emphasizes that true AGI must encompass the ability to perform economically meaningful tasks in the real world, which is heavily constrained by physical limitations [49][50]. Group 4 - The discussion includes the notion that the concept of "superintelligence" may be flawed, as it assumes unlimited capacity for self-improvement, which is not feasible given the physical constraints of resources [56][58]. - The article argues that the future of AI will be shaped by economic viability and practical applications rather than the pursuit of an idealized AGI [59][60].