Group 1 - DeepSeek has introduced a new architecture called mHC (manifold-constrained hyperconnection) to address instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3] - The research highlights that while hyperconnections have improved performance by diversifying connection patterns, they have also weakened the inherent identity mapping property of residual connections, leading to training instability and limited scalability [3] - Empirical results indicate that mHC effectively supports large-scale training with only a 6.7% additional time overhead when the expansion rate is set to 4, demonstrating its efficiency [3][5] Group 2 - DeepSeek recently launched two official model versions, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with V3.2 achieving performance comparable to GPT-5 in inference benchmarks, suitable for everyday tasks [6][7] - The V3.2-Speciale model enhances long reasoning capabilities and combines theorem proving abilities, performing similarly to Gemini-3.0-Pro in mainstream inference benchmarks [7] - DeepSeek has also reduced API costs by over 50%, making it more accessible for developers [7] Group 3 - DeepSeek's research paper on the R1 inference model was featured on the cover of the prestigious journal Nature, marking a significant achievement for Chinese AI technology in the international scientific community [8] - This publication is notable as it is the first mainstream large language model research to undergo complete peer review and be published in a leading journal, breaking a gap in the field [8]
DeepSeek,最新发布!