符号推理

Search documents
草稿链代替思维链,推理token砍掉80%,显著降低算力成本和延迟
量子位· 2025-03-10 03:29
Core Viewpoint - The article discusses the introduction of a new method called "Chain of Draft" (CoD) that significantly reduces token usage and inference costs while maintaining accuracy in reasoning tasks, inspired by human problem-solving processes [1][2][4]. Cost Efficiency - CoD reduces token usage by 70-90% compared to the traditional Chain of Thought (CoT) method, leading to lower inference costs. For enterprises processing 1 million reasoning queries monthly, costs can drop from $3,800 (CoT) to $760, saving over $3,000 per month [6][7]. Experimental Validation - Experiments evaluated three types of reasoning tasks: arithmetic reasoning, common sense reasoning, and symbolic reasoning. The accuracy of models like GPT-4o and Claude 3.5 Sonnet improved significantly with CoD, achieving around 91% accuracy in arithmetic reasoning compared to over 95% with CoT [8][9]. - In terms of token usage, CoT generated approximately 200 tokens per response, while CoD only required about 40 tokens, representing an 80% reduction [9]. - CoD also reduced average latency for GPT-4o and Claude 3.5 Sonnet by 76.2% and 48.4%, respectively [10]. Task-Specific Results - In common sense reasoning tasks, CoD maintained high accuracy, with Claude 3.5 Sonnet showing an increase in accuracy under CoD conditions [12]. - For symbolic reasoning tasks, CoD achieved 100% accuracy while significantly reducing both token usage and latency [14]. Limitations - The effectiveness of the CoD method significantly decreases in zero-shot settings, indicating potential limitations in its application [16]. - For smaller models with fewer than 3 billion parameters, while CoD still reduces token usage and improves accuracy, the performance gap compared to CoT is more pronounced [18].