正则化方法 - filings, earnings calls, financial reports, news

正则化方法

Search documents

3 6 Ke· 2025-09-03 23:54

Core Idea - The article discusses a new method called "Goldfish Loss" that allows large language models to avoid memorizing training data while still learning language patterns [1][2]. Group 1: Methodology - Goldfish Loss involves randomly removing a small portion of tokens during the loss function calculation, preventing the model from memorizing the training data verbatim [2][3]. - A hashing-based masking strategy is designed to ensure consistency in the tokens that are removed, allowing the model to "guess" rather than reproduce the training data [3][7]. - The method contrasts with traditional regularization techniques like Dropout, which can still lead to memorization if the same tokens are removed inconsistently across training iterations [5][7]. Group 2: Experimental Results - Experiments were conducted in two scenarios: an extreme scenario with repeated training on a small sample and a standard scenario simulating typical batch processing [8][10]. - In the extreme scenario, standard training led to the model verbatim memorizing 84 out of 100 articles, while Goldfish Loss resulted in no memorization [8][10]. - The performance of the model using Goldfish Loss was comparable to standard loss models, indicating that the ability to generate text was not significantly affected [12]. Group 3: Implications - The core of Goldfish Loss is to ignore the gradients of certain tokens, which may require the model to process more data to compensate for the missing information, potentially affecting computational efficiency [13].