Sparse Attention - filings, earnings calls, financial reports, news

Sparse Attention

Search documents

Bloomberg· 2025-09-30 05:30

RT Saritha Rai (@SarithaRai)DeepSeek debuts "DeepSeek Sparse Attention" next-gen architecture in experimental version of model(Native Sparse Attention paper by DeepSeek founder Liang Wenfeng & others won the ACL 2025 Best Paper award)Gift link (Free to read until Oct 7)https://t.co/EeZFpsm8bA#AI https://t.co/7ye5aImbcg ...

Artificial Intelligence

Sparse Attention

DeepSeek Sparse Attention

Artificial Intelligence

Sparse Attention

DeepSeek Sparse Attention

DeepSeek发布新模型V3.2-Exp并再度降价

Xin Jing Bao· 2025-09-29 13:28

Core Insights - DeepSeek has released an experimental version of its model, DeepSeek-V3.2-Exp, which introduces Sparse Attention for improved training and inference efficiency on long texts [1] Group 1: Model Development - The new version, V3.2-Exp, is a step towards a next-generation architecture, building on the previous V3.1-Terminus [1] - The Sparse Attention mechanism is aimed at optimizing the model's performance for long text processing [1] Group 2: Pricing and Accessibility - The API pricing has been significantly reduced, with costs now at 0.2 yuan per million tokens for cache hits, 2 yuan for cache misses, and 3 yuan for output [1] - This pricing represents a reduction of over 50% compared to previous costs for developers using the DeepSeek API [1]

Seek .(US:SKLTY)

Sparse Attention

Artificial Intelligence

DeepSeek-V3.2-Exp

Sparse Attention

Artificial Intelligence

DeepSeek-V3.2-Exp

“价格屠夫”DeepSeek上线，新模型成本下降超50%

Di Yi Cai Jing· 2025-09-29 11:50

Core Insights - DeepSeek, known as the "price butcher," has significantly reduced its pricing for the newly released DeepSeek-V3.2-Exp model, with output prices dropping by 75% and overall API costs for developers decreasing by over 50% [1][3]. Pricing Changes - Input pricing for DeepSeek-V3.2-Exp has been adjusted: - Cache hit price decreased from 0.5 yuan per million tokens to 0.2 yuan per million tokens - Cache miss price reduced from 4 yuan per million tokens to 2 yuan per million tokens - Output pricing has been slashed from 12 yuan per million tokens to 3 yuan per million tokens [3]. Model Performance and Features - The V3.2-Exp model is an experimental version that introduces DeepSeek Sparse Attention, enhancing training and inference efficiency for long texts without compromising output quality [3][6]. - Performance evaluations show that DeepSeek-V3.2-Exp maintains comparable results to the previous V3.1-Terminus model across various public benchmark datasets [3][4][5]. Community Support and Open Source - DeepSeek has open-sourced GPU operators designed for the new model, including TileLang and CUDA versions, encouraging community research and experimentation [6]. - The model is now available on platforms like Huggingface and has been updated across official applications and APIs [5][6]. Industry Context - Following the recent release of DeepSeek-V3.1-Terminus, there is speculation about the future of the V4 and R2 versions, with industry voices expressing anticipation for major updates [6].

Seek .(US:SKLTY)

Sparse Attention

Artificial Intelligence

DeepSeek-V3.2-Exp

DeepSeek-V3.1-Terminus

Sparse Attention

Artificial Intelligence

DeepSeek-V3.2-Exp

DeepSeek-V3.1-Terminus

大模型 “注意力简史”：与两位 AI 研究者从 DeepSeek、Kimi 最新改进聊起

晚点LatePost· 2025-03-02 06:10

嘉宾丨肖朝军、傅天予整理丨程曼祺上周，DeepSeek、Kimi 都放出了新的大模型架构改进和优化成果，分别是 NSA、MoBA。二者都聚焦对大模型中 "注意力机制" 的改进。 o 1 、 R 1 等推理模型的出现，给了长文本新课题。注意力机制是当前大语言模型（LLM）的核心机制。2017 年 6 月那篇开启大语言模型革命的 Transformer 八子论文，标题就是：Attention Is All You Need（注意力就是你所需要的一切）。而优化 Attention 的计算效率和效果，又能帮助解决 AI 学界和业界都非常关心的一个问题，就是长文本（long context）。不管是要一次输入一整本书，让模型能帮我们提炼、理解；还是在生成现在 o1、R1 这类模型需要的长思维链；又或者是希望模型未来能有越来越长的 "记忆"，这都需要长文本能力的支持。这期节目我们邀请了两位做过 Attention 机制改进的 AI 研究者做嘉宾。一位是清华计算机系自然语言处理实验室的博士生肖朝军，他是 InfLLM 注意力机制改进的一作，导师是清华计算机系副教授 ...

Artificial Intelligence

Artificial Intelligence

NSA