八年后，Meta教会了Transformer「显式思考」

Core Insights - Meta has recently made significant moves, including mass layoffs and high-intensity research output, exemplified by the release of a new paper titled "The Free Transformer" by François Fleuret, a researcher from the University of Geneva [1][4]. Summary by Sections Introduction - The paper introduces a new architecture called Free Transformer, which redefines the traditional Transformer model by incorporating unsupervised latent variables to enhance performance on downstream tasks [4]. Key Innovations - The Free Transformer breaks the core rules that have governed GPT models since 2017, allowing for internal decision-making before generating content, thus addressing issues like hallucinations in content generation [4][6]. Model Architecture - The architecture includes a standard decoder structure with noise injection, allowing for shared Transformer modules between the encoder and decoder, significantly reducing computational costs [9][14]. Training and Performance - Experimental results show that the Free Transformer outperforms traditional models in tasks such as code generation, mathematical word problems, and multiple-choice tasks, particularly with models having 1.5 billion and 8 billion parameters [6][27][28]. Results Overview - Performance metrics indicate substantial improvements in various tasks, including HumanEval+, MBPP, and GSM8K, with notable enhancements in reasoning capabilities [27][31].