Core Insights - Meta has introduced a new model called "Free Transformer," which challenges the foundational rules of existing GPT models by allowing for pre-thought generation rather than token-by-token guessing [1][3][31] Technical Innovations - The Free Transformer incorporates latent random variables (Z) in the decoder, enabling the model to perform internal sampling and planning before generating outputs, akin to a "subconscious" layer [3][4][27] - This innovation adds approximately 3% to the computational overhead while significantly enhancing performance in reasoning and structured generation tasks, outperforming larger models in benchmarks like GSM8K, MMLU, and HumanEval [3][19][24] - The architecture allows for early global decision-making, resulting in more consistent and stable outputs without doubling computational costs [10][12][19] Performance Metrics - The Free Transformer has shown substantial improvements in various benchmarks: - HumanEval+ scores increased by 44% - MBPP test scores improved by 35% - GSM8K math problem scores rose by 30% [28][31] - For the 1.5B model, performance gains were observed across multiple tasks, with notable increases in pass rates for human evaluation and other reasoning tasks [26][30] Research and Development - The model was developed by researchers at Meta's FAIR lab, led by François Fleuret, who is focused on advancing AI beyond current LLM technologies [39][41] - The Free Transformer represents a significant shift in the approach to AI model architecture, moving from mere prediction to a more thoughtful generation process [31][43]
Meta打碎Transformer 8年铁律,改写AI最底层规则,模型首次冒出潜意识