Model Architecture - filings, earnings calls, financial reports, news

Model Architecture

Search documents

Jin Rong Jie· 2026-01-20 23:59

Core Insights - DeepSeek has unveiled its new model "MODEL1" on the first anniversary of DeepSeek-R1, indicating a significant development in its product line [1] - The company updated its FlashMLA code on GitHub, with 28 mentions of MODEL1 across 114 files, suggesting that MODEL1 is a distinct architecture compared to V32, which is identified as DeepSeek-V3.2 [1] - Key differences in the code include KV cache layout, sparsity handling, and FP8 decoding, highlighting various optimizations in memory usage [1] - There are reports that DeepSeek plans to release its next-generation flagship model around mid-February, coinciding with the Chinese New Year [1]

Seek .(US:SKLTY)

Model Architecture

Memory Optimization

Artificial Intelligence

Artificial Intelligence

Polyhedra· 2025-09-25 12:00

6/Currently working on Gemma3 quantization, focusing on:- Learning the new model architecture- Adding KV cache support (which accelerates inference)- Implementing quantization support for some new operators-- Full operator support will require 1+ additional day, plus more time for accuracy testingStay tuned for more updates 🔥 ...

xAI· 2025-08-28 18:12

We built Grok Code Fast 1 from scratch, starting with a brand-new lightweight model architecture.Combined with novel improvements to accelerate serving efficiency, Grok Code Fast 1 sets a new standard for both speed and affordability. https://t.co/p04xX7uf8w ...