Workflow
Model Architecture
icon
Search documents
DeepSeek新模型MODEL1曝光
Jin Rong Jie· 2026-01-20 23:59
Core Insights - DeepSeek has unveiled its new model "MODEL1" on the first anniversary of DeepSeek-R1, indicating a significant development in its product line [1] - The company updated its FlashMLA code on GitHub, with 28 mentions of MODEL1 across 114 files, suggesting that MODEL1 is a distinct architecture compared to V32, which is identified as DeepSeek-V3.2 [1] - Key differences in the code include KV cache layout, sparsity handling, and FP8 decoding, highlighting various optimizations in memory usage [1] - There are reports that DeepSeek plans to release its next-generation flagship model around mid-February, coinciding with the Chinese New Year [1]
X @Polyhedra
Polyhedra· 2025-09-25 12:00
6/Currently working on Gemma3 quantization, focusing on:- Learning the new model architecture- Adding KV cache support (which accelerates inference)- Implementing quantization support for some new operators-- Full operator support will require 1+ additional day, plus more time for accuracy testingStay tuned for more updates 🔥 ...
X @xAI
xAI· 2025-08-28 18:12
We built Grok Code Fast 1 from scratch, starting with a brand-new lightweight model architecture.Combined with novel improvements to accelerate serving efficiency, Grok Code Fast 1 sets a new standard for both speed and affordability. https://t.co/p04xX7uf8w ...