Scaling Law - filings, earnings calls, financial reports, news - Reportify

Scaling Law

Search documents

倒反天罡，Gemini Flash表现超越Pro，“帕累托前沿已经反转了”

3 6 Ke· 2025-12-22 10:12

Core Insights - Gemini 3 Flash has outperformed its predecessor Gemini 2.5 Pro and even the flagship Gemini 3 Pro in various performance metrics, achieving a score of 78% in the SWE-Bench Verified test, surpassing the Pro's score of 76.2% [1][5][6] - The Flash version demonstrates significant improvements in programming capabilities and multimodal reasoning, with a score of 99.7% in the AIME 2025 mathematics benchmark when code execution is included [5][6] - Flash's performance in the challenging Humanity's Last Exam test is competitive, scoring 33.7% without tools, closely trailing the Pro's 37.5% [5][6] Performance Metrics - In the SWE-Bench Verified test, Gemini 3 Flash scored 78%, while Gemini 3 Pro scored 76.2% [5][6] - In the AIME 2025 mathematics benchmark, Flash scored 99.7% with code execution, while Pro scored 100% [6] - Flash achieved 33.7% in the Humanity's Last Exam, compared to Pro's 37.5% [5][6] Cost and Efficiency - Gemini 3 Flash has a competitive pricing structure, with input costs at $0.50 per million tokens and output costs at $3.00 per million tokens, which is higher than Gemini 2.5 Flash but justified by its performance [7] - Flash's inference speed is three times that of Gemini 2.5 Pro, with a 30% reduction in token consumption [7] Strategic Insights - Google’s core team views the Pro model as a means to distill the capabilities of Flash, emphasizing that Flash's smaller size and efficiency are crucial for users [11][12] - The development team believes that the traditional scaling law is evolving, with a shift from merely increasing parameters to enhancing inference capabilities [12][14] - The emergence of Flash has sparked discussions about the validity of the "parameter supremacy" theory, suggesting that smaller, more efficient models can outperform larger ones [13][14]

帕累托前沿反转

参数至上论

帕累托前沿反转

参数至上论

信仰与突围：2026人工智能趋势前瞻

3 6 Ke· 2025-12-22 09:32

谁也无法想到，ChatGPT迎来三周年之际，没有庆祝和纪念，反而是内部发布的一封红色警报，再次敲响了人工智能竞争白热化的战鼓。在受到Gemini 3 惊艳效果的威胁下，Open AI加速推出了GPT 5.2，用更多的资源，在多项指标上实现了反超。但三年下来，各大模型之间的性能差距和范式差异持续缩小，业界出现不少质疑的声音，认为大模型发展正面临天花板。但也有很多人坚定看好AGI的到来，产业充满了更多的争论和分化。站在2025的年尾，回顾来时之路，从DeepSeek的火热，到GPT4o 后吉卜力动画的流行，Sora2的与山姆奥特曼同框，再到谷歌Nano Banana生图的各种机器猫讲解。有时似乎有恍如隔世之感，一项今年的技术，仿佛已是多年前的流行。展望2026，我们不仅感受到对大模型智能瓶颈和投资回报不确定性的焦虑，看到更多的非共识，也看到大家的坚守和信仰，以及有望在多个方向的突围，更多的期待和探索正在扑面而来。信仰 1.Scalling Law驱动向AGI持续进化自 ChatGPT 横空出世以来，业界主流都相信只要不断增加算力、扩充数据、堆叠参数，机器的智能就会像物理定律一样增长，直至触达 AGI ...

Artificial Intelligence

多模态模型

Artificial Intelligence

Artificial Intelligence

多模态模型

Artificial Intelligence

信仰与突围：2026人工智能趋势前瞻

腾讯研究院· 2025-12-22 08:33

信仰 1.Scalling Law驱动向AGI持续进化王齐昂独立科技观察者谁也无法想到，ChatGPT迎来三周年之际，没有庆祝和纪念，反而是内部发布的一封红色警报，再次敲响了人工智能竞争白热化的战鼓。在受到Gemini 3惊艳效果的威胁下，Open AI加速推出了GPT 5.2，用更多的资源，在多项指标上实现了反超。但三年下来，各大模型之间的性能差距和范式差异持续缩小，业界出现不少质疑的声音，认为大模型发展正面临天花板。但也有很多人坚定看好AGI的到来，产业充满了更多的争论和分化。站在2025的年尾，回顾来时之路，从DeepSeek的火热，到GPT4o 后吉卜力动画的流行，Sora2的与山姆奥特曼同框，再到谷歌Nano Banana生图的各种机器猫讲解。有时似乎有恍如隔世之感，一项今年的技术，仿佛已是多年前的流行。展望2026，我们不仅感受到对大模型智能瓶颈和投资回报不确定性的焦虑，看到更多的非共识，也看到大家的坚守和信仰，以及有望在多个方向的突围，更多的期待和探索正在扑面而来。自 ChatGPT 横空出世以来，业界主流都相信只要不断增加算力、扩充数据、堆叠参数，机器的智能就会 ...

多模态技术

多模态技术

倒反天罡！Gemini Flash表现超越Pro，“帕累托前沿已经反转了”

量子位· 2025-12-22 08:01

Core Insights - Gemini 3 Flash outperforms its predecessor Gemini 2.5 Pro and even the flagship Gemini 3 Pro in various benchmarks, achieving a score of 78% in the SWE-Bench Verified test, surpassing Gemini 3 Pro's score of 76.2% [1][6][9] - The performance of Gemini 3 Flash in the AIME 2025 mathematics competition benchmark is notable, scoring 99.7% with code execution capabilities, indicating its advanced mathematical reasoning skills [7][8] - The article emphasizes a shift in perception regarding flagship models, suggesting that smaller, optimized models like Flash can outperform larger models, challenging the traditional belief that larger models are inherently better [19][20] Benchmark Performance - In the Humanity's Last Exam, Flash scored 33.7% without tools, closely trailing Pro's 37.5% [7][8] - Flash's performance in various benchmarks includes: - 90.4% in GPQA Diamond for scientific knowledge [8] - 95.2% in AIME 2025 for mathematics without tools [8] - 81.2% in MMMU-Pro for multimodal understanding [8] - Flash's speed is three times that of Gemini 2.5 Pro, with a 30% reduction in token consumption, making it cost-effective at $0.50 per million tokens for input and $3.00 for output [9] Strategic Insights - Google’s team indicates that the Pro model's role is to "distill" the capabilities of Flash, focusing on optimizing performance and cost [10][12][13] - The evolution of scaling laws is discussed, with a shift from merely increasing parameters to enhancing reasoning capabilities through advanced training techniques [15][16] - The article highlights the importance of post-training as a significant area for future development, suggesting that there is still substantial room for improvement in open-ended tasks [17][18] Paradigm Shift - The emergence of Flash has sparked discussions about the validity of the "parameter supremacy" theory, as it demonstrates that smaller, more efficient models can achieve superior performance [19][21] - The integration of advanced reinforcement learning techniques in Flash is cited as a key factor in its success, proving that increasing model size is not the only path to enhancing capabilities [20][22] - The article concludes with a call to reconsider the blind admiration for flagship models, advocating for a more nuanced understanding of model performance [23]

帕累托前沿

参数至上论

帕累托前沿

参数至上论

MiniMax海螺视频团队首次开源：Tokenizer也具备明确的Scaling Law

量子位· 2025-12-22 04:41

一水发自凹非寺量子位 | 公众号 QbitAI MiniMax海螺视频团队不藏了！首次开源就揭晓了一个困扰行业已久的问题的答案—— 为什么往第一阶段的视觉分词器里砸再多算力，也无法提升第二阶段的生成效果？翻译成大白话就是，虽然图像/视频生成模型的参数越做越大、算力越堆越猛，但用户实际体验下来总有一种微妙的感受——这些庞大的投入与产出似乎不成正比，模型离完全真正可用总是差一段距离。 So why？问题，大概率就出在视觉分词器（Tokenizer）这个东西身上了。当算力不再是答案时，真正需要被重新审视的，其实是生成模型的"起点"。在当前主流的两阶段生成框架中（分词器+生成模型），业界已经在视觉分词器的预训练上投入了大量算力与数据，但一个尴尬的事实是：这些成本，几乎没有线性地转化为生成质量的提升。而MiniMax海螺视频团队，不止挑战了这一现实——用实验证明"Tokenizer的scaling能够提升模型性能"。更关键的是，还带来了一款开箱即用、专为"下一代生成模型"打造的可扩展视觉分词器预训练框架——Visual Tokenizer Pre-training （以下简称VTP） ...

预训练缩放问题

通用表征学习

Artificial Intelligence

Visual Tokenizer Pre - training (VTP)

预训练缩放问题

通用表征学习

Artificial Intelligence

Visual Tokenizer Pre - training (VTP)

Scaling Law没死，Gemini核心大佬爆料，谷歌已有颠覆性密钥

3 6 Ke· 2025-12-22 01:05

谷歌又要有重大突破了？最近，Google DeepMind的Gemini预训练负责人Sebastian Borgeaud在采访中给出重磅爆料—— Google DeepMind的Gemini预训练负责人Sebastian Borgeaud在最近的访谈中表示，预计在未来一年内，针对提升长上下文处理效率以及进一步扩展模型上下文长度的预训练技术，将会有重大创新。未来一年，大模型预训练领域将在「长上下文处理效率」和「上下文长度扩展」两大方向迎来重大技术创新。同时，Google Gemini三巨头——Jeff Dean、OriolVinyalsML和Noam Shazeer罕见同台了，他们的对谈中，跟Sebastian的内容展现出了惊人的一致。众多高瞻远瞩、闪烁着智慧光芒的思想让人深思。难怪，谷歌依然是那个巨人。谷歌大佬激动预言已破解大模型核心秘密另外他还透露说，最近他们在注意力机制方面取得了一些非常有趣的发现，这可能在未来几个月内重塑他们的研究方向。对此，他表示非常兴奋。而且他提出了振聋发聩的一句话：Scaling Law并未消亡，只是正在演变！ Sebastian Borgeaud是Gemin ...

注意力机制

效率与成本革命

注意力机制

效率与成本革命

自变量王潜：具身智能是物理世界的独立基础模型｜MEET2026

量子位· 2025-12-21 05:45

Core Viewpoint - The embodiment intelligence model is considered an independent foundational model parallel to language and multimodal models, specifically designed for the physical world [6][12][61] Group 1: Differences Between Physical and Virtual Worlds - The fundamental differences between the physical and virtual worlds are recognized, with the physical world characterized by continuity, randomness, and processes related to force, contact, and timing [2][10] - Existing models based on language and visual paradigms are structurally misaligned with the complexities of the physical world [3][21] Group 2: Need for a Separate Foundational Model - A separate foundational model is necessary due to the significant randomness in the physical world, which existing models struggle to accurately represent [10][17] - The current reliance on multimodal models for embodiment intelligence is seen as inadequate, necessitating a complete rethinking of model architecture and training methods [9][21] Group 3: Future of Multimodal Models - Shifting perspectives on embodiment intelligence will lead to new insights in model architecture and data utilization [24][30] - The learning processes in the physical world differ fundamentally from those in the virtual world, suggesting that future multimodal models must adapt to these differences [25][28] Group 4: Scaling Laws and Data Utilization - The concept of Scaling Law is crucial in the development of large models, particularly in robotics, where data sourcing remains a significant challenge [47][49] - A phased approach to training and data collection is recommended, emphasizing the importance of real-world data for effective learning [52][53] Group 5: Hardware and AI Integration - A new learning paradigm necessitates the redesign of hardware in the physical world, advocating for AI to define hardware rather than the other way around [54][55] - The potential for embodiment intelligence to drive exponential growth in resources and capabilities is highlighted, drawing parallels to historical industrial advancements [60][61]

多模态模型

具身智能模型

多模态模型

具身智能模型

清华孙茂松：对工业界而言，大厂可以Scaling，其他玩家重在垂直应用 | MEET2026

量子位· 2025-12-21 02:00

编辑部整理自 MEET2026 量子位 | 公众号 QbitAI 涌现，AI沙场如今兵家争锋所期待出现的「境界」。自从Scaling Law为模型带来惊人的能力增长后，几乎所有模型厂商都被卷入了一场无止境的FOMO，没人敢停下来。我觉得大模型最有魅力的地方，在于它是非线性变化，代表着极大的不确定性，但一旦出现性能涌现就将远超想象。在量子位MEET2026智能未来大会上，清华大学人工智能研究院常务副院长，欧洲科学院外籍院士孙茂松如此感慨。只要算力还能堆、参数还能涨，就不能停止烧钱。然而，在Scaling的边际成本越来越高的背景下，万一最后发现这是条死胡同，投入全打水漂了怎么办？孙茂松的建议是，可以「致广大」，但更要「尽精微」。就企业界而言，少数实力极其雄厚的团队，可以尝试在「致广大」方向上继续跟随国际前沿；但绝大多数AI公司，都应该把主要精力放在「尽精微」上。为了完整呈现孙茂松的思考，在不改变原意的基础上，量子位对演讲内容进行了整理编辑，希望能提供新的视角与洞察。 MEET2026智能未来大会是由量子位主办的行业峰会，近30位产业代表与会讨论。线下参会观众近1500人，线上直播观众35 ...

刘煜辉：当AI Scaling撞上天花板，谁在真正兑现技术红利？

Xin Lang Cai Jing· 2025-12-18 09:31

炒股就看金麒麟分析师研报，权威，专业，及时，全面，助您挖掘潜力主题机会！来源：刘煜辉的高维宏观中国资本市场应当承担起为"东大治权时代"进行全球资产定价的新使命——这意味着我们要逐步淡出对"西大"估值体系的被动映射，转而建立一套自己的独立资产定价体系。全球产业格局正在发生根本性位移：过去由西大主导的技术叙事和金融定价，越来越难以反映我们在工业制造、系统集成上的压倒性优势。东大在落地能力和完整产业生态上的优势无以伦比。在AI领域，全球绝大多数端侧设备（手机、PC等等）的硬件制造和供应链整合集中于中国；在新能源车领域，从电池材料、电芯到整车，中国已形成闭环产能，占据全球60%以上份额；在光伏、风电、特高压电网等绿色能源基础设施上，东大也输出全球；从新能源到废塑化学循环，都是刨西大王朝能源基的祖坟，挑战传统石化能源路径。这就是东大的超级工业Power，它的"超级"在于不依赖于资源的能源产能、电网等等。这些凝聚着工匠精神和大国重器的产业，未来理应享有全球资产溢价。反观西大，其定位已经越来越接近于一个纯粹的技术蓝图输出者。而支撑其AI叙事的Scaling Law（"模型性能随算力、数据和参数规模 ...

东大治权时代

特高压电网

东大治权时代

特高压电网

AGI为什么不会到来？这位研究员把AI的“物理极限”讲透了

3 6 Ke· 2025-12-17 11:43

Group 1 - The article discusses the skepticism surrounding the realization of Artificial General Intelligence (AGI), emphasizing that current optimism in the market may be misplaced due to physical constraints on computation [1][4]. - Tim Dettmers argues that computation is fundamentally bound by physical laws, meaning that advancements in intelligence are limited by energy, bandwidth, storage, manufacturing, and cost [3][4]. - Dettmers identifies several key judgments regarding AGI: the success of Transformer models is not coincidental but rather an optimal engineering choice under current physical constraints, and further improvements yield diminishing returns [4][6]. Group 2 - The article highlights that discussions about AGI often overlook the physical realities of computation, leading to misconceptions about the potential for unlimited scaling of intelligence [5][9]. - It is noted that as systems mature, linear improvements require exponentially increasing resource investments, which can lead to diminishing returns [10][16]. - The article points out that the performance gains from GPUs, which have historically driven AI advancements, are nearing their physical and engineering limits, suggesting a shift in focus is necessary [18][22]. Group 3 - Dettmers suggests that the current trajectory of AI development may be approaching a stagnation point, particularly with the introduction of Gemini 3, which could signal a limit to the effectiveness of scaling [33][36]. - The cost structure of scaling has changed, with past linear costs now becoming exponential, indicating that further scaling may not be sustainable without new breakthroughs [35][36]. - The article emphasizes that true AGI must encompass the ability to perform economically meaningful tasks in the real world, which is heavily constrained by physical limitations [49][50]. Group 4 - The discussion includes the notion that the concept of "superintelligence" may be flawed, as it assumes unlimited capacity for self-improvement, which is not feasible given the physical constraints of resources [56][58]. - The article argues that the future of AI will be shaped by economic viability and practical applications rather than the pursuit of an idealized AGI [59][60].

AGI（通用人工智能）

Artificial Intelligence

AGI（通用人工智能）

Artificial Intelligence