Workflow
Model Architecture
icon
Search documents
DeepSeek新模型MODEL1曝光
Jin Rong Jie· 2026-01-20 23:59
DeepSeek-R1发布一周年之际,新模型"MODEL1"曝光。DeepSeek在GitHub更新FlashMLA代码,横跨 114个文件中有28处提到MODEL1,与V32作为不同的模型出现。已知V32是DeepSeek-V3.2,MODEL1 很可能是新的架构。代码中的具体差异体现在KV缓存布局、稀疏性处理和FP8解码方面,在内存优化 上有多处不同。此前有消息称DeepSeek将在2月中旬春节前后发布下一代旗舰模型。 ...
X @Polyhedra
Polyhedra· 2025-09-25 12:00
6/Currently working on Gemma3 quantization, focusing on:- Learning the new model architecture- Adding KV cache support (which accelerates inference)- Implementing quantization support for some new operators-- Full operator support will require 1+ additional day, plus more time for accuracy testingStay tuned for more updates 🔥 ...
X @xAI
xAI· 2025-08-28 18:12
We built Grok Code Fast 1 from scratch, starting with a brand-new lightweight model architecture.Combined with novel improvements to accelerate serving efficiency, Grok Code Fast 1 sets a new standard for both speed and affordability. https://t.co/p04xX7uf8w ...