别直接训,给主模型加个错题本,6B轻松超越8B
3 6 Ke·2025-12-25 07:05

Core Insights - The article introduces the concept of "Mistake Log," which records the internal thought processes of large models when they make errors, aiming to enhance their learning through structured reflection [3][4][17] - This approach contrasts with traditional training methods that focus solely on whether the model's output is correct, highlighting a gap in the model's ability to engage in deep reflection similar to human learning [2][4] Group 1: Concept of Mistake Log - The Mistake Log consists of three layers: Question (the problem the model is addressing), Rationale (the internal reasoning state), and Mistakes (detailed error analysis at the token level) [5][8] - The Rationale layer captures the model's hidden states during the error, providing a snapshot of its cognitive state at the moment of the mistake [7][10] - The Mistake Log generates structured records of errors, allowing for a comprehensive understanding of where and how mistakes occur during the model's training [6][10] Group 2: Implementation and Benefits - An auxiliary model, referred to as Copilot, is introduced to learn from the Mistake Log of the main model, enhancing its ability to predict and correct errors in real-time [10][11] - The integration of Copilot allows for dynamic adjustments to the model's reasoning trajectory based on historical errors, improving overall performance [13][14] - Experimental results show that combining a smaller Copilot model with a larger main model can yield better performance than simply increasing the size of the main model, indicating that error correction capabilities are crucial [15][16] Group 3: Future Directions - The article emphasizes that the exploration of the Mistake Log mechanism is just the beginning, with potential for further optimization in its representation and the design of the Copilot [17] - It raises questions about the effectiveness of self-reflection based on internal states compared to external correction methods, suggesting a need for deeper investigation in future research [17]

别直接训,给主模型加个错题本,6B轻松超越8B - Reportify