DeepSeek-V3.2被找出bug了:疯狂消耗token,答案还可能出错,研究人员:GRPO老问题没解决
Seek .Seek .(US:SKLTY) 3 6 Ke·2025-12-04 02:21

Core Insights - DeepSeek-V3.2 has gained significant attention but still exhibits bugs, particularly in token efficiency, which has been a longstanding issue [1][4]. Group 1: Performance Issues - The Speciale version of DeepSeek-V3.2 consumes a higher number of tokens for complex tasks, requiring 77,000 tokens compared to Gemini's 20,000 tokens for the same problem [4]. - The model has a "length bias," where longer incorrect answers are penalized less, leading to the generation of verbose but incorrect responses [8][11]. Group 2: Algorithmic Biases - The GRPO algorithm has two hidden biases: length bias and difficulty bias. The length bias results in longer incorrect answers being favored, while the difficulty bias causes the model to focus excessively on overly simple or overly difficult questions, neglecting those of medium difficulty which are crucial for skill improvement [8][9]. - The core author of the research, Zichen Liu, noted that while the new advantage value calculation has corrected the difficulty bias, the length bias remains unaddressed [10][11]. Group 3: Token Efficiency and Cost - DeepSeek's official report acknowledges that token efficiency is still a challenge for V3.2, as the new models require generating longer trajectories to match the output quality of Gemini-3.0-Pro [14]. - Despite the high token consumption, DeepSeek-V3.2 is priced at only 1/24th of GPT-5, making it relatively acceptable in terms of cost [14].