Workflow
Free Transformer
icon
Search documents
Meta打碎Transformer 8年铁律,改写AI最底层规则,模型首次冒出潜意识
3 6 Ke· 2025-10-24 11:47
Core Insights - Meta has introduced a new model called "Free Transformer," which challenges the foundational rules of existing GPT models by allowing for pre-thought generation rather than token-by-token guessing [1][3][31] Technical Innovations - The Free Transformer incorporates latent random variables (Z) in the decoder, enabling the model to perform internal sampling and planning before generating outputs, akin to a "subconscious" layer [3][4][27] - This innovation adds approximately 3% to the computational overhead while significantly enhancing performance in reasoning and structured generation tasks, outperforming larger models in benchmarks like GSM8K, MMLU, and HumanEval [3][19][24] - The architecture allows for early global decision-making, resulting in more consistent and stable outputs without doubling computational costs [10][12][19] Performance Metrics - The Free Transformer has shown substantial improvements in various benchmarks: - HumanEval+ scores increased by 44% - MBPP test scores improved by 35% - GSM8K math problem scores rose by 30% [28][31] - For the 1.5B model, performance gains were observed across multiple tasks, with notable increases in pass rates for human evaluation and other reasoning tasks [26][30] Research and Development - The model was developed by researchers at Meta's FAIR lab, led by François Fleuret, who is focused on advancing AI beyond current LLM technologies [39][41] - The Free Transformer represents a significant shift in the approach to AI model architecture, moving from mere prediction to a more thoughtful generation process [31][43]
八年后,Meta教会了Transformer「显式思考」
机器之心· 2025-10-24 03:40
Core Insights - Meta has recently made significant moves, including mass layoffs and high-intensity research output, exemplified by the release of a new paper titled "The Free Transformer" by François Fleuret, a researcher from the University of Geneva [1][4]. Summary by Sections Introduction - The paper introduces a new architecture called Free Transformer, which redefines the traditional Transformer model by incorporating unsupervised latent variables to enhance performance on downstream tasks [4]. Key Innovations - The Free Transformer breaks the core rules that have governed GPT models since 2017, allowing for internal decision-making before generating content, thus addressing issues like hallucinations in content generation [4][6]. Model Architecture - The architecture includes a standard decoder structure with noise injection, allowing for shared Transformer modules between the encoder and decoder, significantly reducing computational costs [9][14]. Training and Performance - Experimental results show that the Free Transformer outperforms traditional models in tasks such as code generation, mathematical word problems, and multiple-choice tasks, particularly with models having 1.5 billion and 8 billion parameters [6][27][28]. Results Overview - Performance metrics indicate substantial improvements in various tasks, including HumanEval+, MBPP, and GSM8K, with notable enhancements in reasoning capabilities [27][31].