Scaling Law
Search documents
具身智能的数据困境?简智正以闭环飞轮推进解决
具身智能之心· 2025-12-17 10:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 "模仿学习(如看视频)必要,但真正掌握技能,真机数据是关键。" 香港大学李弘扬近期在多场具身智能行 业论坛上的发言,精准戳中了赛道发展的核心痛点。这一观点在行业内已形成广泛共识——智源研究院院长 王仲远就曾直言, "数据,尤其是高质量的数据,决定模型能力的上限" ,而当前具身智能最突出的困境正是 高质量真机数据的极度匮乏。2025年,具身智能融资热度飙升、政策持续加码,可数据基建的滞后却成了行 业规模化落地的"绊脚石"。做过具身智能研究的人都清楚, 真机数据稀缺、采集效率低下、处理链路冗长 , 这些问题足以让多数企业陷入"巧妇难为无米之炊"的困境。 这片蓝海市场中, 简智机器人 在赛道中逐渐崭露头角。作为专注于 具身智能全链路解决方案 的科技企业, 其核心理念是"具身智能源于人、回归人",并凭借全栈自研的"产品+产线"双轨战略,搭建起 "人类技能数字 化 - 云端AI数据治理 - 机器人应用"的完整闭环。 行业痛点如何破解?简智给出了自己的答案 自变量机器人 CTO 王昊曾直言,具身智能领域正面临显著的"数据困境"。在行业内,Aloha设备已是常见的真 机采 ...
大模型的进化方向:Words to Worlds | 对话商汤林达华
量子位· 2025-12-17 09:07
金磊 发自 凹非寺 量子位 | 公众号 QbitAI 李飞飞 团队最新的空间智能模型 Cambrian-S ,首次被一个国产开源AI超越了。 从这张展示空间感知能力的雷达图中,一个名为 SenseNova-SI 的模型,它在多个维度上的能力评分均已将Cambrian-S给包围。 而且从具体的数据来看,不论是开源或闭源,不论是2B或8B大小,SenseNova-SI在各大空间智能基准测试中都拿下了SOTA的成绩: | Model | vsı | MMSI | MindCube-Tiny | ViewSpatial | SITE | | --- | --- | --- | --- | --- | --- | | Open-source Models (~2B) | | | | | | | InternVL3-2B | 32.9 | 26.5 | 37.5 | 32.5 | 30.0 | | Qwen3-VL-2B-Instruct | 50.3 | 28.9 | 34.5 | 36.9 | 35.6 | | MindCube-3B-RawQA-SFT | 17.2 | 1.7 | 51.7 | 24.1 | 6. ...
从「密度法则」来看Scaling Law撞墙、模型密度的上限、豆包手机之后端侧想象力......|DeepTalk回顾
锦秋集· 2025-12-15 04:09
Core Insights - The article discusses the transition from the "Scaling Law" to the "Densing Law," emphasizing the need for sustainable development in AI models as data growth slows and computational costs rise [2][3][15]. - The "Densing Law" indicates that model capability density increases exponentially, with capability density doubling approximately every 3.5 months, while the parameter count and inference costs decrease significantly [11][28]. Group 1: Scaling Law and Its Limitations - The "Scaling Law" has faced challenges due to bottlenecks in training data and computational resources, making it unsustainable to continue increasing model size [15][16]. - The current training data is limited to around 20 trillion tokens, which is insufficient for the expanding needs of model scaling [15]. - The computational resource requirement for larger models is becoming prohibitive, as seen with LLaMA 3, which required 16,000 H100 GPUs for a 405 billion parameter model [16]. Group 2: Introduction of Densing Law - The "Densing Law" proposes that as data, computation, and algorithms evolve together, the density of model capabilities grows exponentially, allowing for more efficient models with fewer parameters [11][28]. - For instance, GPT-3 required over 175 billion parameters, while MiniCPM achieved similar capabilities with only 2.4 billion parameters [24]. Group 3: Implications of Densing Law - The implications of the Densing Law suggest that achieving specific AI capabilities will require exponentially fewer parameters over time, with a notable case being Mistral, which achieved its intelligence level with only 35% of the parameters in four months [32][33]. - Inference costs are also expected to decrease exponentially due to advancements in hardware and algorithms, with costs for similar capabilities dropping significantly over time [36][39]. Group 4: Future Directions and Challenges - The future of AI models will focus on enhancing capability density through a "four-dimensional preparation system," which includes efficient architecture, computation, data quality, and learning processes [49][50]. - The article highlights the importance of high-quality training data and stable environments for post-training data, which are critical for the performance of models in complex tasks [68][70]. Group 5: End-User Applications and Market Trends - By 2026, significant advancements in edge intelligence are anticipated, driven by the need for local processing of private data and the development of high-capacity edge chips [11][45][76]. - The article predicts a surge in edge applications, emphasizing the importance of privacy and personalized experiences in AI deployment [76][77].
错过GPT时刻后,闫俊杰和中国“草根”们准备赢回来
Guan Cha Zhe Wang· 2025-12-12 06:58
Core Insights - Anthropic announced a complete ban on access for Chinese capital entities, reflecting the ongoing tech war between the US and China [1] - The founders of Anthropic and MiniMax, Dario Amodei and Yan Junjie, share a common history as former interns at Baidu, where they first encountered the concept of Scaling Law [1][2] - MiniMax, founded by Yan Junjie after leaving SenseTime, aims to develop general large models, addressing the question of why a Chinese company has not yet produced a model like ChatGPT [4] Group 1: Company Developments - MiniMax and other Chinese open-source model companies are now competing directly with US closed-source models like OpenAI and Anthropic, marking a significant shift in the AI landscape [5] - MiniMax's M2 model achieved significant success on the OpenRouter platform, surpassing 50 billion tokens in consumption, indicating strong market acceptance [9] - MiniMax's annual recurring revenue (ARR) reached $100 million, demonstrating its ability to achieve positive cash flow while many competitors continue to incur losses [14] Group 2: Competitive Landscape - The rise of DeepSeek, another Chinese company, showcases that local teams can produce top-tier models without relying on high-profile talent from Silicon Valley [7] - MiniMax's approach emphasizes the importance of imagination and effective organization over merely hiring expensive talent, challenging the notion that only "genius" individuals can drive innovation [6] - The competitive dynamics have shifted, with Chinese companies now seen as leaders in practical applications of AI, contrasting with the US focus on high valuations and capital games [14] Group 3: Strategic Insights - MiniMax's founder, Yan Junjie, emphasizes a technology-driven approach over traditional mobile internet strategies, focusing on the model itself as the product [10] - The company has established principles of direct user service, globalization, and a technology-driven focus, which have contributed to its success [10] - The efficiency of MiniMax is highlighted by its low training costs compared to OpenAI, achieving high performance with significantly lower capital expenditure [12] Group 4: Future Outlook - The narrative suggests that China is poised to seize a "second opportunity" in AI, moving from a follower to a leader in application and implementation [14] - The confidence in Chinese AI development is bolstered by a belief in the potential of local entrepreneurs to lead the global market in the coming years [15][18] - The ongoing competition between Chinese and US AI firms is framed as a battle of efficiency versus capital, with Chinese companies demonstrating remarkable organizational effectiveness [10][12]
大模型的第一性原理:(一)统计物理篇
机器之心· 2025-12-11 10:00
机器之心发布 作者: 白铂 博士 白铂 博士,华为 2012 实验室理论研究部主任 信息论首席科学家 2022 年底,ChatGPT 横空出世,其能力震惊了整个世界。2024 年底,DeepSeek 以极低的训练成本和极高的性能再次震惊了世界。短短几年间,大模型疯狂迭代, 能力不断提升,仅在美国,AI 领域的投资规模便超过了许多国家全年的 GDP!2025 年底,Google 强势推出 Gemini 3,模型能力突飞猛进,TPU 训练范式也对英 伟达的生态发起了颠覆式挑战。 业界普遍认为 Gemini 3 是迈向通用人工智能(Artificial General Intelligence,AGI) 和超级人工智能(ASI,Artificial Super Intelligence,ASI)的关键突破,是人类 和机器合作的惊人之作。然而,正如 Ilya Sutskever 于 11 月 26 日的访谈中指出:大模型 Scaling Law 和摩尔定律一样,迟早会因为物理限制而失效。因此,如何打 开大模型训练的炼丹炉,看清黑盒子背后的基本原理,回答大模型是否已逼近其能力极限就成为迫在眉睫的问题了。但是,前人对大模 ...
MiniMax 闫俊杰和罗永浩四小时访谈:走出中国AI的第三条路,大山并非不可翻越
3 6 Ke· 2025-12-11 08:11
当整个 AI 圈都在为 DAU(日活跃用户数)和融资额焦虑时,MiniMax 创始人闫俊杰却表现出一种近乎冷酷的淡漠。 坐在罗永浩对面的闫俊杰,并不像一位掌管着 AI 独角兽企业的技术新贵。 他拒绝谈论改变世界,反而坦承恐惧。那种恐惧不是来自商业竞争,而是来自技术本身——当模型的能力开始超越人类时,创造者反而成了最先感到不安 的人。 用 1/50 的筹码通往 AGI 在巨头环伺、算力短缺、热钱褪去的 2025 年,MiniMax 正在进行一场关于认知的修正:不再沿用移动互联网的逻辑,即通过大规模投放换取增长、通过 堆砌功能留住用户,而是回归本质: 把模型当作最重要的产品 。 在大模型时代,真正的产品其实是模型本身,传统意义上的产品更像是一个渠道。如果模型不够聪明,产品做得再好也没有用。 在罗永浩和闫俊杰这期对谈里,我发现 MiniMax 这家 AI 公司从创业第一天就选择了注定与主流背道而驰的技术路径。 当所有人都试图寻找中国的 OpenAI 和 Sam Altman 时,闫俊杰却在试图证明「非天才」的价值。MiniMax 的故事不是关于天才的灵光乍现,而是一场关 于如何在资源受限的缝隙中,通过极度理性地计算 ...
资深科技投资者:如果没有Scaling Law的突破,2024年AI就崩了
Hua Er Jie Jian Wen· 2025-12-10 08:26
关于预训练Scaling Law,Baker强调,Gemini 3的发布具有里程碑意义,因为它明确证实了该定律仍然 有效。 在此之前,没有人能从原理上完全解释为何Scaling Law会起作用,它更多是一种类似古埃及人观测天 象的"经验观察"——虽然能够精确测量金字塔轴线与星象的对齐,却并不理解背后的轨道力学。 对于投资者而言,每一次对Scaling Law的确认都至关重要。如果这一经验定律失效,意味着海量的资 本支出将无法转化为更强的智能表现。 Gemini 3证明了即便在现有硬件架构下,通过增加算力和数据,模型基座的能力依然在提升。但Baker 同时指出,仅靠预训练阶段的Scaling Law,并不能解释过去半年的市场繁荣。 Gavin Baker指出,Gemini 3的发布证明大模型的扩展定律(Scaling Law)依然有效。 周二,资深科技投资者Gavin Baker在最近的播客访谈中指出,谷歌Gemini 3模型的推出验证了即使在硬 件算力受限的窗口期,AI仍能通过新的推理机制实现能力跃升。 他强调若非模型推理能力的及时涌现,全球AI产业本将在2024年中期至Gemini 3发布期间陷入完全停 滞 ...
当千亿参数撞上5毫米芯片
Tai Mei Ti A P P· 2025-12-10 03:19
Core Insights - The global tech industry is experiencing a shift from cloud-based AI to edge AI, driven by the limitations of cloud dependency and the need for real-time processing in critical applications [1][4][18] - The current trend emphasizes the development of smaller, more efficient AI models that can operate independently on edge devices, rather than relying on large cloud models [16][18] Group 1: Challenges of Cloud Dependency - Cloud-based AI systems face significant latency issues, which can be detrimental in time-sensitive applications like autonomous driving [2][4] - Privacy concerns arise from the need to transmit sensitive data to cloud servers, making edge computing a more attractive option for users [2][4] Group 2: The Shift to Edge AI - The industry is moving towards a "cloud-edge-end" architecture, where complex tasks are handled by cloud models while real-time tasks are managed by edge devices [7][18] - Edge AI must overcome the "impossible triangle" of high intelligence, low latency, and low power consumption, necessitating innovative solutions [7][8] Group 3: Techniques for Edge AI Implementation - Knowledge distillation is a key technique that allows smaller models to retain the intelligence of larger models by learning essential features and reasoning paths [8][10] - Extreme quantization reduces model size and increases speed by compressing model weights, allowing for efficient processing on edge devices [10][11] - Structural pruning eliminates redundant connections in neural networks, further optimizing performance for edge applications [10][11] Group 4: Hardware Innovations - The "memory wall" issue in traditional architectures leads to inefficiencies, prompting the development of specialized architectures that integrate storage and computation [11][13] - Companies are exploring dedicated chip designs that optimize performance for specific AI tasks, enhancing efficiency in edge computing [13][14] Group 5: Industry Evolution - The focus is shifting from general-purpose AI models to specialized models that excel in specific applications, improving reliability and performance [15][16] - The Chinese AI industry is collectively recognizing the importance of practical applications over sheer model size, leading to a more grounded approach to AI development [16][18]
月之暗面迎来一名女总裁
Hua Er Jie Jian Wen· 2025-12-09 13:01
作者 | 周智宇 编辑 | 张晓玲 张予彤,这位一度引起争议的金沙江创投前主管合伙人,以一个全新身份走向台前。 近日真格基金在清华大学举办的一场交流会上,张予彤首次以"Kimi总裁"的身份公开亮相。她负责的是 Kimi整体战略与商业化。 张予彤也借着这场演讲,回应了外界对于"独角兽资金不足、算力匮乏"的质疑,强调Kimi的效率优势。 从某种程度上来说,这也是场另类路演。 放眼望去,曾经并肩作战的"大模型六小虎"已在分岔路口渐行渐远:抢滩上市的急迫、无奈折叠万亿参 数雄心的妥协,以及被价格屠夫无情击穿底线的恐慌,共同交织成一幅残酷的众生相。在巨头围剿与资 本退出的双重夹击下,所有的技术信仰最终都必须兑换成财务报表上的数字。 张予彤走向台前,正是月之暗面试图穿越这片商业"无人区"的最后一搏,也预示着这场关乎生死的中场 战事,来到重要赛点。 走向台前 杨植麟需要张予彤。或者更准确地说,处于"中场战事"的月之暗面,急需一位懂资本、懂战略、更懂如 何把技术兑换成商业价值的操盘手。 这是一场跨越十年的重逢,也是一次角色的彻底重塑。作为清华系的"师姐",张予彤曾是杨植麟上一家 创业公司循环智能的伯乐。 如今,她正式成为这家 ...
Scaling Law 仍然成立,企业搜广推怎么做才能少踩“坑”?
AI前线· 2025-12-09 06:26
作者 | AICon 全球人工智能开发与应用大会 策划 | 罗燕珊 编辑 | 宇琪 当大模型从通用技术探索深入产业场景,搜索、广告与推荐系统作为连接用户需求与业务价值的 核心链路,正迎来全链路智能重构。那么,生成式推荐真正落地后的关键挑战是什么?又应该如 何解决? 近日 InfoQ《极客有约》X AICon 直播栏目特别邀请了 京东内容推荐架构负责人颜林 担任主持 人,和 荣耀 AI 算法专家冯晓东、京东算法总监张泽华、中科大计算机学院副教授王皓 一 起,在 AICon 全球人工智能开发与应用大会 2025 北京站 即将召开之际,共同探讨生成式推 荐的落地洞察。 部分精彩观点如下: 完整直播回放可查看: https://www.infoq.cn/video/0ViWrdqyQwNvO7TdQpyD 以下内容基于直播速记整理,经 InfoQ 删减。 行业真正做到端到端的统一 pipeline 仍有较大差距,更多工作还是在 pipeline 的单点与大模型 结合。 搜广推场景中的 scaling law 依然成立,并且仍在快速上升阶段。 低价值场景用小模型覆盖,高价值场景用大模型争取额外收益。 不应拘泥于某项技术 ...