Workflow
AtomThink
icon
Search documents
自动调整推理链长度,SCoT来了!为激发推理能力研究还提出了一个新架构
量子位· 2025-03-13 03:28
Core Insights - The article introduces SCoT (Self-structured Chain of Thought), a new reasoning paradigm that dynamically adjusts the reasoning chain based on the complexity of the problem, addressing the limitations of existing methods in reasoning diversity and efficiency [1][2][4]. Group 1: SCoT and AtomThink Framework - SCoT breaks down the reasoning process into minimal semantic atomic steps, allowing for the dynamic generation of reasoning structures suitable for varying complexities of problems [2][10]. - AtomThink is a comprehensive framework that includes data construction, training, reasoning, and evaluation, aimed at enhancing the performance of multimodal large models in complex reasoning tasks [3][13]. - The framework consists of four key modules: a data engine for generating high-quality multi-step reasoning paths, atomic step fine-tuning, strategy-guided multi-round reasoning, and atomic capability assessment [14]. Group 2: Experimental Results - The research team utilized different scales of LLaVA1.5-7B and Llama3.2-Vision-11B as baseline models, fine-tuning them with the AMATH-SFT dataset and evaluating them on various benchmark datasets [15]. - The results showed that AtomThink significantly improved the accuracy of baseline models, with increases of 10.9%, 10.2%, and 7.2% on the MathVista, MathVerse, and MathVision datasets, respectively [17]. - Compared to existing structured CoT methods, AtomThink demonstrated significant advantages in accuracy, data utilization efficiency, and reasoning efficiency, achieving a fivefold increase in data utilization efficiency and an 85.3% improvement in reasoning efficiency [18]. Group 3: Reasoning Behavior and Evaluation - The study introduced a novel evaluation method to assess the model's utilization of different intermediate steps, revealing a phenomenon of cumulative reasoning errors, particularly in the early stages of CoT [20][21]. - The atomic capability assessment highlighted the need for future work to focus on quality control during the initial phases of reasoning to mitigate high error rates in data extraction and image description tasks [21].