Workflow
信息论
icon
Search documents
最新发现!每参数3.6比特,语言模型最多能记住这么多
机器之心· 2025-06-04 04:41
Core Insights - The memory capacity of GPT series models is approximately 3.6 bits per parameter, indicating a limit beyond which models stop memorizing and begin to generalize [1][4][27]. Group 1: Memory and Generalization - The research distinguishes between two types of memory: unexpected memory (specific dataset information) and generalization (understanding of the real data generation process) [5][7]. - A new method was proposed to estimate a model's understanding of specific data points, which helps measure the capacity of modern language models [2][8]. Group 2: Model Capacity and Measurement - The study defines model capacity as the total amount of memory that can be stored across all parameters of a specific language model [17][18]. - The maximum memory capacity is reached when the model no longer increases its memory with larger datasets, indicating saturation [19][28]. - Experiments showed that the memory capacity of models scales with the number of parameters, with a stable memory of 3.5 to 3.6 bits per parameter observed [27][28]. Group 3: Experimental Findings - The research involved training hundreds of transformer language models with parameters ranging from 500,000 to 1.5 billion, leading to insights on scaling laws related to model capacity and data size [6][25]. - Results indicated that even with different dataset sizes, the memory bits remained consistent, reinforcing the relationship between model capacity and parameter count [28][29]. - The impact of precision on capacity was analyzed, revealing that increasing precision from bfloat16 to float32 slightly improved capacity, with average values rising from 3.51 bits/parameter to 3.83 bits/parameter [31][32].
当答案变得廉价时,好问题就是新的稀缺品
3 6 Ke· 2025-05-04 00:03
Group 1 - The core argument of the article is that in an era where answers are easily accessible, the value lies in asking the right questions, which can reshape understanding and drive creativity [1][4][19] - The invention of photography in the 1830s challenged traditional artistic standards, leading artists to focus on subjective experiences rather than mere replication of reality [3][10][11] - The emergence of large language models (LLMs) has made obtaining answers cheaper, but this has led to a decline in the quality of inquiry and an increase in the cost of asking good questions [15][17][26] Group 2 - The article emphasizes that the value of information is proportional to the uncertainty it eliminates, as illustrated by Claude Shannon's information theory [21][22][23] - It argues that in a world of information overload, the challenge is not the lack of facts but the misalignment of attention, leading to a focus on quantity over quality in answers [31][32][46] - The piece highlights the importance of redefining problems and frameworks to navigate structural uncertainties effectively, suggesting that good questions can expand the boundaries of understanding [37][38][39]