Core Viewpoint - DeepSeek-V3.2 has gained significant attention but has been found to have issues, particularly with token consumption during complex tasks, leading to longer and potentially incorrect answers [1][4][5]. Group 1: Token Consumption Issues - DeepSeek-V3.2's Speciale version consumes more tokens compared to competitors, using 77,000 tokens for certain tasks while Gemini only uses 20,000 tokens [5]. - The model's reliance on the GRPO algorithm has led to a "length bias," where longer incorrect answers are less penalized, resulting in the generation of "long and wrong" responses [10][11]. Group 2: Hidden Biases in GRPO Algorithm - The GRPO algorithm has two hidden biases: length bias and difficulty bias. The length bias results in longer incorrect answers being favored, while the difficulty bias causes the model to focus excessively on overly simple or overly difficult questions, neglecting medium-difficulty questions that are crucial for skill improvement [10][12]. - Despite attempts to address these biases, the length bias remains a challenge, as acknowledged in DeepSeek's technical report [15][13]. Group 3: Cost and Resource Considerations - DeepSeek-V3.2's output cost is significantly lower than that of GPT-5, at only 1/24 of the price, which may make it more acceptable despite its token efficiency issues [17]. - The model's context length of 128K has not been updated for a long time, which may be related to limited GPU resources [18].
DeepSeek-V3.2被找出bug了:疯狂消耗token,答案还可能出错,研究人员:GRPO老问题没解决
量子位·2025-12-03 09:05