垃圾刷多了AI也会变蠢,“年度最令人不安的论文”
3 6 Ke·2025-11-17 00:36

Core Insights - The article discusses the phenomenon of "Brain Rot" in AI, indicating that exposure to low-quality data can lead to irreversible cognitive decline in large language models (LLMs) [1][4][11]. Group 1: Research Findings - A recent study found that feeding LLMs with low-value Twitter data resulted in a 23% decrease in reasoning ability and a 30% decline in long-context memory [4][11]. - The study introduced the "LLM Brain Rot Hypothesis," exploring whether LLMs experience cognitive decline similar to humans when exposed to low-quality data [5][11]. - The research defined "garbage data" as non-malicious low-quality content, such as short, highly popular tweets, and identified two dimensions for categorizing this data: engagement and semantic quality [5][11]. Group 2: Methodology - The researchers trained four different LLMs using both garbage and control data, ensuring that the token counts were consistent to eliminate data volume bias [7][11]. - Various cognitive benchmarks were employed to assess the models' capabilities, including ARC for reasoning, RULER for memory and multitasking, and TRAIT for personality traits [9][10][11]. Group 3: Implications for the Industry - The study emphasizes the importance of data quality during the pre-training phase, suggesting that the industry should focus on data selection as a safety issue rather than only post-training alignment [23]. - It recommends implementing cognitive assessments for LLMs to prevent degradation of capabilities due to exposure to low-quality data [23]. - The findings indicate that metrics like "popularity" may be more effective than text length in determining data quality, advocating for the exclusion of short, highly viral content in future training datasets [23].