翁荔最新万字长文：Why We Think

Core Insights - The article discusses the concepts of "Test-time Compute" and "Chain-of-Thought" (CoT) as methods to significantly enhance model performance in artificial intelligence [1][2][6] Group 1: Motivation and Theoretical Background - Allowing models to think longer before providing answers can be achieved through various methods, enhancing their intelligence and overcoming current limitations [2][8] - The core idea is deeply related to human thinking processes, where humans require time to analyze complex problems, aligning with Daniel Kahneman's dual-system theory from "Thinking, Fast and Slow" [10][11] - By consciously slowing down and reflecting, models can engage in more rational decision-making, akin to human System 2 thinking [11][12] Group 2: Computational Resources and Model Architecture - Deep learning views neural networks as capable of accessing computational and storage resources, optimizing their use through gradient descent [13] - In Transformer models, the computational load (flops) for each generated token is approximately double the number of parameters, with sparse models like Mixture of Experts (MoE) utilizing only a fraction of parameters during each forward pass [13] - CoT allows models to perform more computations for each token based on the difficulty of the problem, enabling variable computational loads [13][18] Group 3: CoT and Learning Techniques - Early improvements in CoT involved generating intermediate steps for mathematical problems, with subsequent research showing that reinforcement learning can significantly enhance CoT reasoning capabilities [19][20] - Supervised learning on human-written reasoning paths and appropriate prompts can greatly improve the mathematical abilities of instruction-tuned models [21][23] - The effectiveness of CoT prompts in increasing success rates for solving mathematical problems is more pronounced in larger models [23] Group 4: Sampling and Revision Techniques - The fundamental goal of test-time computation is to adaptively modify the model's output distribution during reasoning [24] - Parallel sampling methods are straightforward but limited by the model's ability to generate correct solutions in one go, while sequential revision requires careful execution to avoid introducing errors [24][25] - Combining both methods can yield optimal results, with simpler problems benefiting from sequential testing and more complex problems performing best with a mix of both approaches [24][25] Group 5: Advanced Techniques and Future Directions - Various advanced algorithms, such as Best-of-N and Beam Search, are employed to optimize the search process for high-scoring samples [29][30] - The RATIONALYST system focuses on synthesizing reasoning based on vast unannotated data, providing implicit and explicit guidance for generating reasoning steps [32][33] - Future challenges include enhancing computational efficiency, integrating self-correction mechanisms, and ensuring the reliability of reasoning outputs [47][50]