Workflow
大模型每百天性能翻倍,清华团队“密度法则”登上Nature子刊
3 6 Ke·2025-11-20 08:48

Core Insights - The article discusses the challenges and new perspectives in the development of large models, particularly focusing on the "Density Law" proposed by Tsinghua University, which indicates an exponential growth in the maximum capability density of large language models from February 2023 to April 2025, doubling approximately every 3.5 months [1][8]. Group 1: Scaling Law and Density Law - Since 2020, OpenAI's Scaling Law has driven the rapid development of large models, but by 2025, the sustainability of this path is in question due to increasing training costs and the nearing exhaustion of publicly available internet data [1]. - The Density Law provides a new perspective on model development, suggesting that just as the semiconductor industry improved chip density, large models can achieve efficient development through increased capability density [3][4]. Group 2: Implications of Density Law - The research team hypothesizes that different-sized models, when trained adequately, will have the same capability density, establishing a baseline for measuring other models [4]. - The Density Law indicates that the inference cost for models of the same capability decreases exponentially over time, with empirical data showing that the API price for models like GPT-3.5 has decreased by 266.7 times over 20 months, roughly halving every 2.5 months [7][8]. Group 3: Acceleration of Capability Density - An analysis of 51 recent open-source large models revealed that the maximum capability density has been increasing exponentially, with a doubling time of approximately 3.5 months since 2023 [8][9]. - Following the release of ChatGPT, the capability density has increased at a faster rate, doubling every 3.2 months compared to every 4.8 months prior, indicating a 50% acceleration in density enhancement [9][10]. Group 4: Limitations of Model Compression - The research found that model compression algorithms do not always enhance capability density, as many compressed models performed worse than their original counterparts due to insufficient training [11][13]. Group 5: Future Prospects - The intersection of chip circuit density (Moore's Law) and model capability density (Density Law) suggests that edge devices will be able to run higher-performance large models, leading to explosive growth in edge computing and terminal intelligence [14]. - Tsinghua University and the Mianbi Intelligence team are advancing high-density model development, with models like MiniCPM and VoxCPM gaining global recognition and significant download numbers, indicating a trend towards efficient and low-cost models [16].