DeepSeek-V3.2巨「吃」Token，竟然是被GRPO背刺了

Core Insights - The article discusses the release of the DeepSeek-V3.2 model, highlighting its performance issues, particularly in token consumption and output verbosity, which have raised concerns among users and researchers [1][2][6]. Token Consumption and Efficiency - DeepSeek-V3.2 Speciale exhibits inefficient token usage, consuming 77,000 tokens for tasks where Gemini only requires 20,000, indicating over three times the token expenditure for similar quality results [1][6]. - Users have noted that the generation speed of DeepSeek-V3.2 Speciale is approximately 30 tokens per second, and an increase to around 100 tokens per second could significantly enhance usability and experience [6]. Output Quality and Verbosity - The Speciale version tends to produce lengthy and verbose outputs, often resulting in incorrect responses, which is attributed to inherent flaws in the GRPO algorithm [2][15]. - The model's performance in benchmark tests shows that it has a median score of 76.38, with a median difference of 11.07% compared to other models, indicating a notable gap in efficiency [7]. Comparison with Other Models - In benchmark comparisons, DeepSeek-V3.2 Speciale's token consumption during inference has been reported to be significantly higher than its predecessor, with a consumption of 86 million tokens compared to 62 million for the previous version [7][10]. - The model's performance metrics reveal that it lags behind competitors like Gemini-3.0 Pro in terms of output token delay and efficiency [10][12]. Algorithmic Limitations - The GRPO algorithm, which underpins DeepSeek, has been criticized for introducing biases that lead to longer and often incorrect responses, a problem that persists in the latest model [16][20]. - Length bias, a significant issue in the GRPO algorithm, causes the model to generate longer responses even when they are incorrect, which has been identified as a primary reason for the high token consumption in DeepSeek-V3.2 Speciale [20][23]. Future Directions - The developers acknowledge the need for improved token efficiency as a critical area for future research, aiming to balance performance and cost in subsequent model iterations [14][23].