Core Insights - DeepSeek has released the experimental version DeepSeek-V3.2-Exp, which introduces Sparse Attention for improved training and inference efficiency on long texts [1][2] - The new model has led to a significant reduction in service costs, with API prices dropping by over 50% for developers [1] - Cambricon has quickly adapted to the new model and open-sourced the vLLM-MLU inference engine, allowing developers to experience the new features immediately [1][2] - Huawei Ascend has also achieved 0-day support for DeepSeek-V3.2-Exp, optimizing deployment on the CANN platform and maintaining low inference generation speeds [3] Group 1 - DeepSeek-V3.2-Exp introduces Sparse Attention for enhanced efficiency [1] - API costs for developers have been reduced by over 50% [1] - Cambricon has achieved day 0 adaptation for the new model [2] Group 2 - Huawei Ascend has completed the adaptation and optimization for DeepSeek-V3.2-Exp [3] - The deployment strategy utilizes DeepSeek's large EP parallel scheme [3] - Inference generation speeds are maintained below 2 seconds for TTFT and 30 milliseconds for TPOT on long sequences [3]
DeepSeek新版本API价格下调 寒武纪:对新模型DeepSeek