大模型的摩尔定律
Search documents
从ChatGPT3年8亿周活到Higgsfield5个月1亿美元ARR:学术和资本看见了“大模型的摩尔定律 ”|DeepTalk
锦秋集· 2025-12-01 10:00
Core Insights - The article emphasizes the shift from "scaling up" large language models (LLMs) to "increasing capability density," highlighting the limitations of simply adding more computational power and data to larger models [2][3] - A new concept called "Densing Law" is introduced, which indicates that the capability density of LLMs is exponentially increasing, approximately doubling every 3.5 months [18][19] Group 1: Transition from Scaling Law to Densing Law - The article discusses the evolution from Scaling Law, which led to the development of large models like GPT-3 and Llama-3.1, to the need for improved inference efficiency [10] - Two core questions are raised: the ability to quantitatively assess the quality of different scale LLMs and the existence of a law reflecting LLM efficiency trends [10] - A quantitative evaluation method based on a reference model is proposed to address the non-linear relationship between capability and parameter size [11][12] Group 2: Capability Density and Its Implications - Capability density is defined as the ratio of effective parameter size to actual parameter size, allowing for fair comparisons across different model architectures [13] - The article notes that if the density (ρ) equals 1, the model is as efficient as the reference model; if greater than 1, it indicates higher efficiency [15] - A comprehensive evaluation of 51 mainstream open-source foundational models reveals that capability density has been increasing exponentially over time, leading to the establishment of the Densing Law [17] Group 3: Insights from Densing Law - The article identifies three key insights: 1. Data quality is a core driver of the Densing Law, attributed to the explosive growth in pre-training data and its quality [19] 2. Large models do not necessarily equate to high density, as training costs and resource limitations can hinder optimal performance [19] 3. The Densing Law reflects a pursuit of computational efficiency akin to Moore's Law in integrated circuits [19] Group 4: Predictions and Implications - The article predicts that the actual parameter size required to achieve the same performance level will decrease exponentially over time, with a case study comparing MiniCPM and Mistral models illustrating this trend [21] - It also notes that inference costs will decrease exponentially, with recent technological advancements in infrastructure contributing to this reduction [22][23] - The combination of Densing Law and Moore's Law suggests significant potential for edge-side intelligence, with the effective parameter scale on fixed-price hardware expected to double approximately every 88 days [24] Group 5: Acceleration of Density Growth Post-ChatGPT - Following the release of ChatGPT, the growth rate of model density has accelerated, with a notable increase in the slope of density growth trends [25] - Factors contributing to this acceleration include increased investment in LLM research, a thriving open-source ecosystem, and the proliferation of high-quality small models [28] Group 6: Challenges in Model Compression - The article cautions that compression techniques like pruning, distillation, and quantization do not always enhance density, as many compressed models exhibit lower density than their original versions [30] - It emphasizes the importance of ensuring that compressed models undergo sufficient training to maintain or improve capability density [30] Group 7: Future Directions in Model Training - The discovery of Densing Law suggests a fundamental shift in training paradigms, moving from a focus on size to efficiency per parameter [32] - Key dimensions for enhancing density include efficient architecture, advanced data engineering, and the collaborative evolution of large and small models [33][34][35]