Workflow
DeepSeek新模型开源,新架构亮了,国产AI芯片集体狂欢
Seek .Seek .(US:SKLTY) 3 6 Ke·2025-09-30 01:15

Core Insights - DeepSeek has announced the open-source release of the DeepSeek-V3.2-Exp experimental model, which introduces the DeepSeek Sparse Attention mechanism, significantly improving long text training and inference efficiency without compromising output quality [1][9] - The new model reduces service costs by over 50%, with the price for outputting 1 million tokens dropping to 3 yuan, which is one-fourth of the previous model's cost [3][5] - Major cloud platforms and AI chip manufacturers have quickly adapted to the new model, indicating strong industry support and interest [5][10] Model Performance - DeepSeek-V3.2-Exp shows comparable performance to its predecessor, DeepSeek-V3.1-Terminus, across various benchmarks, although it uses significantly fewer tokens for task completion [5][6] - In specific benchmarks, DeepSeek-V3.2-Exp maintained scores such as 85.0 in MMLU-Pro and improved in BrowseComp with an accuracy of 40.1, while some scores like Humanity's Last Exam saw a slight decline [6][39] - The model's architecture allows for a reduction in complexity from quadratic to near-linear, enhancing training and inference costs [36][42] Technical Innovations - The model employs a "continue pre-training + post-training" approach, integrating a Lightning Indexer and a fine-grained token selection mechanism to optimize performance [36][38] - The DSA mechanism is still in its prototype phase, indicating potential for further development and refinement [36][44] - DeepSeek has also released related technical reports and code to facilitate research and experimentation [7][9] Industry Impact - The rapid adaptation of DeepSeek-V3.2-Exp by companies like Huawei and Cambrian demonstrates the model's significance in the AI landscape [10][15][17] - The model's launch has sparked discussions in developer communities, highlighting its potential to mark a significant moment in AI development [21][22] - User feedback indicates a mix of excitement and skepticism regarding the model's performance, with some noting improvements in speed but concerns over capability [19][31][32]