喂了几个月的垃圾推文,大模型得了「脑腐」,这病还治不好
机器之心·2025-10-21 03:43

Core Viewpoint - The article discusses a study indicating that large language models (LLMs) can experience cognitive decline, referred to as "brain rot," due to prolonged exposure to low-quality internet content, similar to the effects observed in humans [4][10][12]. Group 1: Research Findings - The study conducted by Texas A&M University, the University of Texas at Austin, and Purdue University demonstrates that LLMs can suffer from cognitive degradation when trained on viral Twitter data characterized by short, engaging posts [4][6]. - Cognitive functions such as reasoning and long-term memory showed significant declines, with reasoning ability decreasing by 23% and long-term memory by 30% after exposure to low-quality data [14][15]. - The research established a "brain rot hypothesis," suggesting that continuous exposure to poor-quality text leads to a sustained decline in cognitive abilities of LLMs [12][29]. Group 2: Experimental Methodology - Researchers utilized a controlled experiment on real Twitter data, creating datasets based on engagement (M1) and semantic quality (M2) to assess the impact of low-quality content on LLMs [13][20]. - M1 focused on the popularity and brevity of posts, while M2 evaluated the sensationalism or superficiality of the content, with both methods indicating a negative correlation between data quality and cognitive performance [13][22]. Group 3: Implications and Recommendations - The findings highlight the necessity for regular "cognitive health checks" for deployed LLMs, emphasizing the importance of data quality in maintaining their cognitive capabilities [17][29]. - The study suggests that the effects of exposure to low-quality data are not easily mitigated through standard fine-tuning techniques, indicating a need for improved data curation practices [29].