懂了很多道理，AI 依然要发疯

Core Insights - The article discusses the current challenges faced by AI agents, particularly in handling long-term tasks without reliable support systems [1] - It identifies two main issues: the "contextual black hole" where models struggle to understand complex contexts, and the "collapse of long-term planning" where models become confused over extended tasks [1][3] - A significant paper by Anthropic titled "The Hot Mess of AI" provides empirical evidence supporting these claims, indicating that as models grow stronger, they do not necessarily become less chaotic [3][6] Group 1: Model Limitations - The paper highlights the illusion of capability in AI models, suggesting that while they may appear to improve, their performance on complex tasks does not follow a linear trajectory [3][6] - The research introduces the concept of incoherence, measuring the proportion of errors caused by variance versus bias, revealing that longer tasks lead to increased incoherence [13][14] - It concludes that larger models tend to exhibit higher incoherence on difficult tasks, contradicting the assumption that larger models are more stable [15][17] Group 2: Theoretical Framework - The authors utilize the bias-variance decomposition to analyze model performance, quantifying the distance between average predictions and actual results [8][9] - They argue that the nature of autoregressive models inherently limits their ability to function as optimizers, which is essential for achieving AGI [20][23] - The paper suggests that the complexity of tasks grows exponentially, making it difficult for models to keep pace with the demands of long-term planning [21][24] Group 3: Potential Solutions - The article proposes several avenues for improvement, including ensembling methods to reduce incoherence by averaging multiple outputs [33][34] - It also discusses the importance of structured reasoning processes to mitigate variance during complex tasks [36] - Lastly, it suggests exploring new paradigms beyond token-level autoregression, such as large concept models that focus on high-level goals rather than discrete tokens [39][40]