Workflow
Scaling Law
icon
Search documents
MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law
量子位· 2025-12-22 04:41
Core Viewpoint - The MiniMax Sea Cucumber video team has introduced a new scalable visual tokenizer pre-training framework (VTP) that addresses the limitations of traditional tokenizers in generating high-quality outputs from generative models, emphasizing the importance of understanding over mere pixel reconstruction [5][15][58]. Group 1: Traditional Tokenizer Limitations - Traditional tokenizers focus on pixel-level reconstruction, which does not necessarily translate to improved generation quality, leading to a saturation point where increased computational resources yield diminishing returns [4][15]. - The "pre-training scaling problem" indicates that better reconstruction accuracy can paradoxically lead to poorer generation performance, as traditional methods often overlook high-level semantic understanding [12][15]. Group 2: VTP's Approach and Innovations - VTP shifts the focus from pixel-level reconstruction to a more holistic understanding of visual semantics, integrating various representation learning methods to enhance the tokenizer's capabilities [26][30]. - The framework employs a multi-task loss function that combines understanding, reconstruction, and generation, allowing the tokenizer to produce semantically rich latent representations that improve downstream model performance [34][35]. Group 3: Empirical Findings and Performance Metrics - VTP demonstrates that injecting "understanding" into the tokenizer significantly enhances generation quality, with empirical evidence showing a positive correlation between understanding capabilities and generation performance [40][41]. - The VTP model achieved a zero-shot classification accuracy of 78.2% on ImageNet, surpassing the original CLIP's 75.5%, and exhibited superior reconstruction and generation capabilities compared to existing models [44]. Group 4: Scaling Law and Industry Implications - VTP reveals a scaling law for tokenizers, indicating that performance can improve with increased computational resources, data, and parameters, challenging the traditional view that scaling benefits only apply to main models [50][54]. - The findings suggest that investing in tokenizer development is crucial for enhancing overall generative system performance, positioning tokenizers as a core component worthy of long-term investment in the industry [58].
Scaling Law没死,Gemini核心大佬爆料,谷歌已有颠覆性密钥
3 6 Ke· 2025-12-22 01:05
Core Insights - Google DeepMind's Gemini pre-training head, Sebastian Borgeaud, predicts significant innovations in long context processing efficiency and context length expansion within the next year [2][4][16] - The recent discussions among key figures at Google, including Jeff Dean, Oriol Vinyals, and Noam Shazeer, indicate a consensus on the evolving nature of AI models and the importance of system architecture over mere model size [26][30][32] Group 1: Innovations in AI - Major advancements are expected in long context capabilities, transforming models into comprehensive digital workspaces capable of handling extensive data and complex tasks [16] - Recent discoveries in attention mechanisms may lead to substantial improvements in model understanding and efficiency, indicating that there is still significant room for enhancement in this area [18] - The return of retrieval-based learning, where models dynamically access external knowledge rather than relying solely on memorized data, is seen as a promising direction for future AI development [19] Group 2: Shift in AI Development Paradigms - The industry is transitioning from a "data abundance" mindset to a "data limited" approach, necessitating more efficient use of available data and a focus on sophisticated system engineering [12][30] - The emphasis is shifting from merely achieving high performance to ensuring models are cost-effective and reliable for long-term deployment [22][30] - The concept of "slow thinking" is introduced, highlighting the need for models to engage in continuous self-assessment and correction rather than just rapid output generation [30] Group 3: System vs. Model - The term "system" is frequently used to describe Gemini, emphasizing its role as a long-term, iterative infrastructure rather than a one-time model achievement [31][32] - The focus on stability, scalability, and the ability to recover from errors is prioritized over immediate performance metrics, indicating a strategic shift in how AI systems are developed and evaluated [32][34] - Google aims to create a sustainable and evolving intelligent system rather than a fleeting product, reflecting a commitment to long-term innovation in AI [34]
自变量王潜:具身智能是物理世界的独立基础模型|MEET2026
量子位· 2025-12-21 05:45
Core Viewpoint - The embodiment intelligence model is considered an independent foundational model parallel to language and multimodal models, specifically designed for the physical world [6][12][61] Group 1: Differences Between Physical and Virtual Worlds - The fundamental differences between the physical and virtual worlds are recognized, with the physical world characterized by continuity, randomness, and processes related to force, contact, and timing [2][10] - Existing models based on language and visual paradigms are structurally misaligned with the complexities of the physical world [3][21] Group 2: Need for a Separate Foundational Model - A separate foundational model is necessary due to the significant randomness in the physical world, which existing models struggle to accurately represent [10][17] - The current reliance on multimodal models for embodiment intelligence is seen as inadequate, necessitating a complete rethinking of model architecture and training methods [9][21] Group 3: Future of Multimodal Models - Shifting perspectives on embodiment intelligence will lead to new insights in model architecture and data utilization [24][30] - The learning processes in the physical world differ fundamentally from those in the virtual world, suggesting that future multimodal models must adapt to these differences [25][28] Group 4: Scaling Laws and Data Utilization - The concept of Scaling Law is crucial in the development of large models, particularly in robotics, where data sourcing remains a significant challenge [47][49] - A phased approach to training and data collection is recommended, emphasizing the importance of real-world data for effective learning [52][53] Group 5: Hardware and AI Integration - A new learning paradigm necessitates the redesign of hardware in the physical world, advocating for AI to define hardware rather than the other way around [54][55] - The potential for embodiment intelligence to drive exponential growth in resources and capabilities is highlighted, drawing parallels to historical industrial advancements [60][61]
清华孙茂松:对工业界而言,大厂可以Scaling,其他玩家重在垂直应用 | MEET2026
量子位· 2025-12-21 02:00
Core Insights - The rapid development of AI and large models has created a competitive landscape where companies are driven by fear of missing out (FOMO) and are compelled to invest heavily in scaling their models and capabilities [2][6][40] - The emergence of capabilities in large models is characterized by non-linear changes, leading to significant uncertainty but also the potential for breakthroughs that can surpass expectations [3][19][15] - The relationship between language, knowledge, and action remains a fundamental challenge for AI, with the goal of achieving a true integration of these elements [15][38][37] Group 1: Development of AI and Large Models - The AI field has evolved significantly over the past eight years, transitioning into the era of pre-trained models and large models since around 2017 [11][10] - Key milestones in this development include the release of models like GPT-3 and ChatGPT, which have demonstrated remarkable capabilities in various tasks [16][24] - The ability of large models to perform well on complex tasks has increased dramatically, with benchmarks being surpassed in text, code, and multi-modal models [20][26][25] Group 2: Challenges and Risks - The costs associated with scaling AI models are becoming increasingly high, raising concerns about the sustainability of such investments [42][43] - There is a significant risk that the pursuit of scaling could lead to diminishing returns, especially if performance begins to plateau [40][41] - The uncertainty surrounding the limits of Scaling Laws poses a challenge for companies, as they must balance the need to invest in AI with the potential for wasted resources [7][68] Group 3: Strategic Recommendations - Companies with substantial resources may continue to pursue large-scale developments, while the majority should focus on niche applications to minimize risks and maximize potential [60][74] - The strategy of "致广大而尽精微" (to strive for greatness while paying attention to details) is recommended, emphasizing the importance of vertical applications in AI [63][69] - There is potential for new AI algorithms to emerge from specific vertical applications, suggesting that focusing on detailed, specialized work can also lead to broader advancements [71][74]
刘煜辉:当AI Scaling撞上天花板,谁在真正兑现技术红利?
Xin Lang Cai Jing· 2025-12-18 09:31
Core Viewpoint - The Chinese capital market should take on the new mission of global asset pricing for the "East Great Governance Era," moving away from passive mapping of the "West Great" valuation system to establish an independent asset pricing framework [1][5][10]. Group 1: Industrial Advantages - The East Great (东大) has unparalleled advantages in implementation capabilities and a complete industrial ecosystem, particularly in the AI sector, where the majority of hardware manufacturing and supply chain integration for end devices (such as smartphones and PCs) is concentrated in China [3][7]. - In the new energy vehicle sector, China has formed a closed-loop production capacity, accounting for over 60% of the global market share, covering everything from battery materials to complete vehicles [3][7]. - The East Great also leads in green energy infrastructure, including solar, wind, and ultra-high voltage power grids, challenging traditional fossil energy paths [3][7]. Group 2: Future Asset Premiums - Industries embodying craftsmanship and national strength should enjoy global asset premiums in the future, while the West Great (西大) is increasingly positioned as a mere technology blueprint provider [5][9]. - The Scaling Law supporting the AI narrative, which states that model performance improves with increased computing power, data, and parameter scale, may hit physical limits around 2026-2027, potentially leading to a rapid decline in value [5][9]. Group 3: Market Pricing Power - The capital market's pricing power must align with the shift in industrial realities, as the future premium will belong to the most solid production capacities [10]. - The ability to transform technology into something that is affordable and indispensable for millions will determine who holds the anchor of value [10].
AGI为什么不会到来?这位研究员把AI的“物理极限”讲透了
3 6 Ke· 2025-12-17 11:43
Group 1 - The article discusses the skepticism surrounding the realization of Artificial General Intelligence (AGI), emphasizing that current optimism in the market may be misplaced due to physical constraints on computation [1][4]. - Tim Dettmers argues that computation is fundamentally bound by physical laws, meaning that advancements in intelligence are limited by energy, bandwidth, storage, manufacturing, and cost [3][4]. - Dettmers identifies several key judgments regarding AGI: the success of Transformer models is not coincidental but rather an optimal engineering choice under current physical constraints, and further improvements yield diminishing returns [4][6]. Group 2 - The article highlights that discussions about AGI often overlook the physical realities of computation, leading to misconceptions about the potential for unlimited scaling of intelligence [5][9]. - It is noted that as systems mature, linear improvements require exponentially increasing resource investments, which can lead to diminishing returns [10][16]. - The article points out that the performance gains from GPUs, which have historically driven AI advancements, are nearing their physical and engineering limits, suggesting a shift in focus is necessary [18][22]. Group 3 - Dettmers suggests that the current trajectory of AI development may be approaching a stagnation point, particularly with the introduction of Gemini 3, which could signal a limit to the effectiveness of scaling [33][36]. - The cost structure of scaling has changed, with past linear costs now becoming exponential, indicating that further scaling may not be sustainable without new breakthroughs [35][36]. - The article emphasizes that true AGI must encompass the ability to perform economically meaningful tasks in the real world, which is heavily constrained by physical limitations [49][50]. Group 4 - The discussion includes the notion that the concept of "superintelligence" may be flawed, as it assumes unlimited capacity for self-improvement, which is not feasible given the physical constraints of resources [56][58]. - The article argues that the future of AI will be shaped by economic viability and practical applications rather than the pursuit of an idealized AGI [59][60].
具身智能的数据困境?简智正以闭环飞轮推进解决
具身智能之心· 2025-12-17 10:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 "模仿学习(如看视频)必要,但真正掌握技能,真机数据是关键。" 香港大学李弘扬近期在多场具身智能行 业论坛上的发言,精准戳中了赛道发展的核心痛点。这一观点在行业内已形成广泛共识——智源研究院院长 王仲远就曾直言, "数据,尤其是高质量的数据,决定模型能力的上限" ,而当前具身智能最突出的困境正是 高质量真机数据的极度匮乏。2025年,具身智能融资热度飙升、政策持续加码,可数据基建的滞后却成了行 业规模化落地的"绊脚石"。做过具身智能研究的人都清楚, 真机数据稀缺、采集效率低下、处理链路冗长 , 这些问题足以让多数企业陷入"巧妇难为无米之炊"的困境。 这片蓝海市场中, 简智机器人 在赛道中逐渐崭露头角。作为专注于 具身智能全链路解决方案 的科技企业, 其核心理念是"具身智能源于人、回归人",并凭借全栈自研的"产品+产线"双轨战略,搭建起 "人类技能数字 化 - 云端AI数据治理 - 机器人应用"的完整闭环。 行业痛点如何破解?简智给出了自己的答案 自变量机器人 CTO 王昊曾直言,具身智能领域正面临显著的"数据困境"。在行业内,Aloha设备已是常见的真 机采 ...
大模型的进化方向:Words to Worlds | 对话商汤林达华
量子位· 2025-12-17 09:07
Core Insights - The article discusses the breakthrough of the SenseNova-SI model, developed by SenseTime, which has surpassed the Cambrian-S model in spatial intelligence capabilities [2][5][50] - It highlights a shift in AI paradigms, moving away from merely scaling models to a focus on foundational research and understanding of multi-modal and spatial intelligence [9][20][22] Model Performance - SenseNova-SI achieved state-of-the-art (SOTA) results across various spatial intelligence benchmarks, outperforming both open-source and proprietary models [4][5] - Specific performance metrics show SenseNova-SI scoring higher than Cambrian-S in key areas such as spatial reasoning and hallucination suppression [50] Paradigm Shift in AI - The article emphasizes that the traditional AI model scaling approach is reaching its limits, necessitating a return to fundamental research [9][15][20] - SenseTime's approach involves a new architecture called NEO, which integrates visual and language processing at the core level, allowing for better understanding of spatial relationships [39][42] Technological Innovations - The NEO architecture allows simultaneous processing of visual and textual tokens, enhancing the model's ability to understand and interact with the physical world [42][46] - SenseNova-SI demonstrates a tenfold increase in data efficiency, requiring only 10% of the training data compared to similar models to achieve SOTA performance [49] Industrial Application - The article discusses the importance of making AI technologies economically viable, emphasizing that high costs and slow processing times are barriers to widespread adoption [55][58] - SenseTime's SekoTalk product exemplifies the successful application of AI in real-time video generation, significantly reducing processing time from hours to real-time [64][66] Future Directions - The article encourages young researchers and entrepreneurs to explore diverse fields beyond large language models, such as embodied intelligence and AI for science [68][70] - It concludes with a vision for China's potential in developing AI that deeply interacts with the physical world, positioning it as a leader in this emerging landscape [72][73]
从「密度法则」来看Scaling Law撞墙、模型密度的上限、豆包手机之后端侧想象力......|DeepTalk回顾
锦秋集· 2025-12-15 04:09
Core Insights - The article discusses the transition from the "Scaling Law" to the "Densing Law," emphasizing the need for sustainable development in AI models as data growth slows and computational costs rise [2][3][15]. - The "Densing Law" indicates that model capability density increases exponentially, with capability density doubling approximately every 3.5 months, while the parameter count and inference costs decrease significantly [11][28]. Group 1: Scaling Law and Its Limitations - The "Scaling Law" has faced challenges due to bottlenecks in training data and computational resources, making it unsustainable to continue increasing model size [15][16]. - The current training data is limited to around 20 trillion tokens, which is insufficient for the expanding needs of model scaling [15]. - The computational resource requirement for larger models is becoming prohibitive, as seen with LLaMA 3, which required 16,000 H100 GPUs for a 405 billion parameter model [16]. Group 2: Introduction of Densing Law - The "Densing Law" proposes that as data, computation, and algorithms evolve together, the density of model capabilities grows exponentially, allowing for more efficient models with fewer parameters [11][28]. - For instance, GPT-3 required over 175 billion parameters, while MiniCPM achieved similar capabilities with only 2.4 billion parameters [24]. Group 3: Implications of Densing Law - The implications of the Densing Law suggest that achieving specific AI capabilities will require exponentially fewer parameters over time, with a notable case being Mistral, which achieved its intelligence level with only 35% of the parameters in four months [32][33]. - Inference costs are also expected to decrease exponentially due to advancements in hardware and algorithms, with costs for similar capabilities dropping significantly over time [36][39]. Group 4: Future Directions and Challenges - The future of AI models will focus on enhancing capability density through a "four-dimensional preparation system," which includes efficient architecture, computation, data quality, and learning processes [49][50]. - The article highlights the importance of high-quality training data and stable environments for post-training data, which are critical for the performance of models in complex tasks [68][70]. Group 5: End-User Applications and Market Trends - By 2026, significant advancements in edge intelligence are anticipated, driven by the need for local processing of private data and the development of high-capacity edge chips [11][45][76]. - The article predicts a surge in edge applications, emphasizing the importance of privacy and personalized experiences in AI deployment [76][77].
错过GPT时刻后,闫俊杰和中国“草根”们准备赢回来
Guan Cha Zhe Wang· 2025-12-12 06:58
Core Insights - Anthropic announced a complete ban on access for Chinese capital entities, reflecting the ongoing tech war between the US and China [1] - The founders of Anthropic and MiniMax, Dario Amodei and Yan Junjie, share a common history as former interns at Baidu, where they first encountered the concept of Scaling Law [1][2] - MiniMax, founded by Yan Junjie after leaving SenseTime, aims to develop general large models, addressing the question of why a Chinese company has not yet produced a model like ChatGPT [4] Group 1: Company Developments - MiniMax and other Chinese open-source model companies are now competing directly with US closed-source models like OpenAI and Anthropic, marking a significant shift in the AI landscape [5] - MiniMax's M2 model achieved significant success on the OpenRouter platform, surpassing 50 billion tokens in consumption, indicating strong market acceptance [9] - MiniMax's annual recurring revenue (ARR) reached $100 million, demonstrating its ability to achieve positive cash flow while many competitors continue to incur losses [14] Group 2: Competitive Landscape - The rise of DeepSeek, another Chinese company, showcases that local teams can produce top-tier models without relying on high-profile talent from Silicon Valley [7] - MiniMax's approach emphasizes the importance of imagination and effective organization over merely hiring expensive talent, challenging the notion that only "genius" individuals can drive innovation [6] - The competitive dynamics have shifted, with Chinese companies now seen as leaders in practical applications of AI, contrasting with the US focus on high valuations and capital games [14] Group 3: Strategic Insights - MiniMax's founder, Yan Junjie, emphasizes a technology-driven approach over traditional mobile internet strategies, focusing on the model itself as the product [10] - The company has established principles of direct user service, globalization, and a technology-driven focus, which have contributed to its success [10] - The efficiency of MiniMax is highlighted by its low training costs compared to OpenAI, achieving high performance with significantly lower capital expenditure [12] Group 4: Future Outlook - The narrative suggests that China is poised to seize a "second opportunity" in AI, moving from a follower to a leader in application and implementation [14] - The confidence in Chinese AI development is bolstered by a belief in the potential of local entrepreneurs to lead the global market in the coming years [15][18] - The ongoing competition between Chinese and US AI firms is framed as a battle of efficiency versus capital, with Chinese companies demonstrating remarkable organizational effectiveness [10][12]