Workflow
密度法则
icon
Search documents
中国大模型团队登Nature封面,刘知远语出惊人:期待明年“用AI造AI”
3 6 Ke· 2025-12-25 01:24
过去半个世纪,全球科技产业的资本开支与创新节奏,都和一个规律紧密相连,那就是摩尔定律——芯片性能每18个月翻一番。 在摩尔定律之外,还有一个"安迪-比尔定律",它讲的是,摩尔定律所主导的硬件性能提升的红利,会迅速被软件复杂度的增加所抵消。安迪指的是英特 尔前CEO安迪·格鲁夫,而比尔,指的是微软创始人比尔·盖茨。 这种"硬件供给、软件消耗"的螺旋上升,驱动了PC与互联网时代的产业进化。 时移世易,安迪、比尔都已经退出产业一线,但是规律的底层逻辑并未改变,而且被新的"安迪·比尔"推向更高的极致。 ChatGPT的爆发拉开了生成式人工智能时代的大幕,在Scaling Law(规模法则)的主导下,模型参数指数级膨胀,软件对算力的索取远超摩尔定律的供给速度, AI发展的边际成本急剧上升。 当硬件供给遭遇能源、数据等天花板时,旧的"安迪比尔"式增长范式开始失效。 产业需要一场逆向革命。大模型作为AI时代的"软件",需要通过极致的算法与工程化重构,在现有硬件上爆发更强的能力。 2025年,中国大模型公司成为这一路径的最坚定实践者。 从DeepSeek V3通过细粒度混合专家(MoE)架构以1/10算力成本对标顶尖模型,到 ...
2025,中国大模型不信“大力出奇迹”?
3 6 Ke· 2025-12-19 11:06
2025年12月,在腾讯科技HiTechDay上,以《模型再进化:2025,智能重新定义世界》为主题的圆桌论坛,正是围绕大模型进化的深度、维度、效率三条 线索展开。 华中师范大学人工智能教育学部助理教授熊宇轩为嘉宾主持,三位嘉宾北京智源人工智能研究院院长王仲远、面壁智能联合创始人、首席科学家刘知远、 峰瑞资本投资合伙人陈石分别从各自的领域,解读2025对于大模型进化的深入观察。 王仲远指出,大模型的进化正在经历"从Learning from Text到Learning from Video"的质变。视频数据中蕴含了丰富的时空信息与动态交互线索,为模型学 习物理世界动态演变规律提供了关键的数据来源,同时也是当前最容易规模化获取的一类多模态数据,是AI"从数字世界迈向物理世界"的关键桥梁,也为 具身智能(Embodied AI)的爆发提供了构建"世界模型"的底座。 刘知远提出的"密度法则"(Densing Law)认为,如同芯片摩尔定律,AI的未来在于不断提升单位参数内的"智能密度"。他大胆预言,未来的算力格局将 是"云端负责规划,端侧负责做事(执行)",到2030年,我们甚至有望在端侧设备上承载GPT-5级别的 ...
对谈刘知远、肖朝军:密度法则、RL 的 Scaling Law 与智能的分布式未来丨晚点播客
晚点LatePost· 2025-12-12 03:09
Core Insights - The article discusses the emergence of the "Density Law" in large models, which states that the capability density of models doubles every 3.5 months, emphasizing efficiency in achieving intelligence with fewer computational resources [4][11][19]. Group 1: Evolution of Large Models - The evolution of large models has been driven by the "Scaling Law," leading to significant leaps in capabilities, surpassing human levels in various tasks [8][12]. - The introduction of ChatGPT marked a steep increase in capability density, indicating a shift in the model performance landscape [7][10]. - The industry is witnessing a trend towards distributed intelligence, where individuals will have personal models that learn from their data, contrasting with the notion that only a few large models will dominate [10][36]. Group 2: Density Law and Efficiency - The Density Law aims to maximize intelligence per unit of computation, advocating for a focus on efficiency rather than merely scaling model size [19][35]. - Key methods to enhance model capability density include optimizing model architecture, improving data quality, and refining learning algorithms [19][23]. - The industry is exploring various architectural improvements, such as sparse attention mechanisms and mixed expert systems, to enhance efficiency [20][24]. Group 3: Future of AI and AGI - The future of AI is expected to involve self-learning models that can adapt and grow based on user interactions, leading to the development of personal AI assistants [10][35]. - The concept of "AI creating AI" is highlighted as a potential future direction, where models will be capable of self-improvement and collaboration [35][36]. - The timeline for achieving significant advancements in personal AI capabilities is projected around 2027, with expectations for models to operate efficiently on mobile devices [33][32].
从ChatGPT3年8亿周活到Higgsfield5个月1亿美元ARR:学术和资本看见了“大模型的摩尔定律 ”|DeepTalk
锦秋集· 2025-12-01 10:00
Core Insights - The article emphasizes the shift from "scaling up" large language models (LLMs) to "increasing capability density," highlighting the limitations of simply adding more computational power and data to larger models [2][3] - A new concept called "Densing Law" is introduced, which indicates that the capability density of LLMs is exponentially increasing, approximately doubling every 3.5 months [18][19] Group 1: Transition from Scaling Law to Densing Law - The article discusses the evolution from Scaling Law, which led to the development of large models like GPT-3 and Llama-3.1, to the need for improved inference efficiency [10] - Two core questions are raised: the ability to quantitatively assess the quality of different scale LLMs and the existence of a law reflecting LLM efficiency trends [10] - A quantitative evaluation method based on a reference model is proposed to address the non-linear relationship between capability and parameter size [11][12] Group 2: Capability Density and Its Implications - Capability density is defined as the ratio of effective parameter size to actual parameter size, allowing for fair comparisons across different model architectures [13] - The article notes that if the density (ρ) equals 1, the model is as efficient as the reference model; if greater than 1, it indicates higher efficiency [15] - A comprehensive evaluation of 51 mainstream open-source foundational models reveals that capability density has been increasing exponentially over time, leading to the establishment of the Densing Law [17] Group 3: Insights from Densing Law - The article identifies three key insights: 1. Data quality is a core driver of the Densing Law, attributed to the explosive growth in pre-training data and its quality [19] 2. Large models do not necessarily equate to high density, as training costs and resource limitations can hinder optimal performance [19] 3. The Densing Law reflects a pursuit of computational efficiency akin to Moore's Law in integrated circuits [19] Group 4: Predictions and Implications - The article predicts that the actual parameter size required to achieve the same performance level will decrease exponentially over time, with a case study comparing MiniCPM and Mistral models illustrating this trend [21] - It also notes that inference costs will decrease exponentially, with recent technological advancements in infrastructure contributing to this reduction [22][23] - The combination of Densing Law and Moore's Law suggests significant potential for edge-side intelligence, with the effective parameter scale on fixed-price hardware expected to double approximately every 88 days [24] Group 5: Acceleration of Density Growth Post-ChatGPT - Following the release of ChatGPT, the growth rate of model density has accelerated, with a notable increase in the slope of density growth trends [25] - Factors contributing to this acceleration include increased investment in LLM research, a thriving open-source ecosystem, and the proliferation of high-quality small models [28] Group 6: Challenges in Model Compression - The article cautions that compression techniques like pruning, distillation, and quantization do not always enhance density, as many compressed models exhibit lower density than their original versions [30] - It emphasizes the importance of ensuring that compressed models undergo sufficient training to maintain or improve capability density [30] Group 7: Future Directions in Model Training - The discovery of Densing Law suggests a fundamental shift in training paradigms, moving from a focus on size to efficiency per parameter [32] - Key dimensions for enhancing density include efficient architecture, advanced data engineering, and the collaborative evolution of large and small models [33][34][35]
大模型不再拼“块头”——大语言模型最大能力密度随时间呈指数级增长
Ke Ji Ri Bao· 2025-11-25 00:13
Core Insights - The Tsinghua University research team has proposed a "density law" for large language models, indicating that the maximum capability density of these models is growing exponentially over time, doubling approximately every 3.5 months from February 2023 to April 2025 [1][2] Group 1: Density Law and Its Implications - The density law reveals that the focus should shift from the size (parameter count) of large models to their "capability density," which measures the intelligence per unit of parameters [2] - The research analyzed 51 open-source large models and found that the maximum capability density has been increasing exponentially, with a notable acceleration post-ChatGPT release, where the density doubled every 3.2 months compared to every 4.8 months before [2] Group 2: Cost and Efficiency - Higher capability density implies that large models become smarter while requiring less computational power and lower costs [3] - The ongoing advancements in capability density and chip circuit density suggest that large models, previously limited to cloud deployment, can now run on terminal chips, enhancing responsiveness and user privacy [3] Group 3: Application in Industry - The application of the density law indicates that AI is becoming increasingly accessible, allowing for more proactive services in smart vehicles, transitioning from passive responses to active decision-making [3]