DeepSeek-V3.2巨「吃」Token,竟然是被GRPO背刺了
Seek .Seek .(US:SKLTY) 3 6 Ke·2025-12-04 10:38

Core Insights - The release of DeepSeek-V3.2 has generated significant attention in the industry, highlighting both its capabilities and areas needing improvement, particularly in token efficiency and output verbosity [1][2][5]. Token Efficiency - DeepSeek-V3.2 Speciale exhibits poor token consumption efficiency, requiring 77,000 tokens for complex tasks compared to Gemini's 20,000 tokens, indicating over three times the token usage for similar quality outputs [1][5]. - Users have noted that if the token generation speed of DeepSeek-V3.2 Speciale could be improved from approximately 30 tokens per second to around 100 tokens per second, the overall usability and experience would significantly enhance [5]. Output Quality - The Speciale version has been criticized for producing lengthy and verbose outputs, often resulting in incorrect answers, which is attributed to inherent flaws in the GRPO algorithm [2][14]. - The technical report from DeepSeek acknowledges the increased token consumption during inference, with the Speciale version consuming 86 million tokens in benchmark tests, up from 62 million in the previous version [7][14]. Algorithmic Issues - The GRPO algorithm, which has been a standard in reinforcement learning, is identified as a source of bias leading to longer and incorrect responses. This includes length bias, where shorter correct responses receive greater updates, and longer incorrect responses face weaker penalties [18][21]. - While the difficulty bias has been optimized in DeepSeek-V3.2, the length bias remains, potentially contributing to the excessive token consumption observed in the Speciale version [18][21].

Seek .-DeepSeek-V3.2巨「吃」Token,竟然是被GRPO背刺了 - Reportify