Workflow
稀疏注意力机制
icon
Search documents
在这个开源「从夯到拉」榜单,我终于明白中国 AI 为什么能逆袭
Xin Lang Cai Jing· 2025-12-17 14:25
最近几天,一张开源模型的等级列表在 X 上被疯狂转载。 ▲ 图片来源:https://www.interconnects.ai/p/2025-open-models-year-in-review 从夯到拉,国产开源模型排在了数一数二的位置,DeepSeek、Qwen、Kimi、智谱、还有 MiniMax 是全 球开源模型的前五名。而 OpenAI 排在了第四梯队,小扎的 Meta,挖了硅谷半壁江山想打造的 Llama 更扎心,只落得了一个荣誉提名。 这份榜单并不是国产模型花钱打广告,也不是中国人王婆卖瓜,自卖自夸。知名的 AI 研究员 Nathan Lambert 和德国 AI 研究中心的博士生 Florian Brand,在 interconnectai 上的一篇文章,给出了全球开源 模型的完整排名。 ▲Nathan Lambert 曾在 Meta、DeepMind、和 Hugging Face 工作 文章里详细回顾过去这一年,全球开源模型的发展,以 DeepSeek 和 Qwen 为主的国产开源模型,正在 用开源改变整个 AI 行业的运行规则。 事实也如此,2024 年对于全球开源来说,可能还是 Llam ...
DeepSeek V3.2发布!实测效果惊艳,便宜是最大优势
3 6 Ke· 2025-12-03 03:57
Core Insights - DeepSeek has launched its V3.2 version, which reportedly matches the inference capabilities of OpenAI's GPT-5 while being significantly cheaper [1][22] - The V3.2 version includes two variants: a free version for users and a Speciale version that supports API access, which boasts enhanced reasoning capabilities [2][22] Performance Enhancements - DeepSeek V3.2-Speciale has demonstrated superior performance in various competitions, achieving gold medal results in IMO 2025, CMO 2025, ICPC World Finals 2025, and IOI 2025, outperforming GPT-5 High in all tests [4][22] - The introduction of the DeepSeek Sparse Attention (DSA) mechanism has fundamentally improved the efficiency of attention in AI models, reducing computational costs by over 60% and increasing inference speed by approximately 3.5 times [6][12] Cost Efficiency - The DSA mechanism allows for a significant reduction in the cost of processing long sequences, with costs dropping from $0.7 to $0.2 per million tokens during the pre-fill phase and from $2.4 to $0.8 during the decoding phase [12][22] - This cost reduction positions DeepSeek V3.2 as one of the most affordable models for long-text inference in its category [12][22] Tool Utilization - DeepSeek V3.2 allows the AI model to call tools during its reasoning process without requiring additional training, enhancing its general performance and compatibility with user-created tools [13][22] - The model demonstrates the ability to break down complex tasks and utilize different tools effectively, showcasing its decision-making capabilities [20][22] Market Impact - The release of DeepSeek V3.2 challenges the notion that open-source models lag behind closed-source counterparts, as it offers competitive performance at a fraction of the cost [22][23] - The DSA mechanism's cost revolution is expected to significantly impact the commercialization of AI models, making advanced AI applications more accessible to smaller enterprises and consumers [22][23]
DeepSeek又上新!模型硬刚谷歌
第一财经· 2025-12-01 14:05
2025.12. 01 两款模型有着不同的定位。DeepSeek-V3.2的目标是平衡推理能力与输出长度,适合日常使用,例 如问答场景和通用智能体任务场景。9月底DeepSeek发布了实验版V3.2-Exp,此次是正式版更 新。在公开推理测试中,V3.2达到了GPT-5的水平,仅略低于谷歌的Gemini3 Pro。 本文字数:1580,阅读时长大约3分钟 作者 | 第一财经 刘晓洁 12月1日晚,DeepSeek又上新了两款新模型,DeepSeek-V3.2和DeepSeek-V3.2-Speciale, 在推理能力上全球领先。 据DeepSeek公布的数据,Speciale在多个推理基准测试中超越谷歌最先进的Gemini3 Pro。具体 来看,在美国数学邀请赛、哈佛MIT数学竞赛、国际奥林匹克数学竞赛等测试中,V3.2-Speciale都 超过了Gemini3 Pro,但在编程、理工科博士生测试中略逊于谷歌。 DeepSeek-V3.2-Speciale则是此次的重头戏,其目标是"将开源模型的推理能力推向极致,探索 模型能力的边界"。据介绍,Speciale是V3.2的长思考增强版,同时结合了DeepSee ...
AGI 新技术路线:下一代稀疏注意力机制 Monte Carlo Attention 开源
AI科技大本营· 2025-11-10 01:03
Core Viewpoint - The article discusses the innovative Monte Carlo Attention mechanism used in the BigBang-Proton framework, which allows for efficient modeling of extremely long contexts by leveraging a unique inter-patch delegation mechanism, achieving linear complexity while overcoming the limitations of traditional attention methods [1][4][32]. Context Length in Material World Modeling - Monte Carlo Attention was developed to meet the theoretical demands of the BigBang-Proton framework, addressing the need for extremely long context lengths due to the integration of diverse scientific data [2][3]. - The estimated total sequence length required for comprehensive virtual cell integration is approximately 10¹⁵ tokens, necessitating a context length far exceeding current large language models [2][3]. Monte Carlo Attention Mechanism - Monte Carlo Attention reduces computational complexity from O(L²) to O(L), significantly improving training efficiency and convergence rates [4]. - This mechanism allows for the training of sequences that are multiple orders of magnitude longer than the device memory capacity, promoting the development of next-generation hardware architectures [4][32]. BigBang-Proton Architecture Components - The BigBang-Proton architecture consists of three core components: Binary Patch Encoding, Monte Carlo Attention, and a Temporal Convolutional Network (TCN) [7][8]. - The inter-patch delegation mechanism enables local and global information exchange, allowing context length to grow exponentially with the number of layers while maintaining linear computational complexity [8][9]. Delegate Operation Process - The delegate operation is a hierarchical process involving the decomposition of input sequences into blocks, generating delegate tokens, distributing them, and enhancing local representations with global context [17][20][22]. - The complexity of attention calculations within each block is O(P²), while global information flow complexity is determined by the number of blocks [28][30]. Comparison with Existing Attention Mechanisms - Monte Carlo Attention differs fundamentally from sparse attention methods by utilizing a reorganization-based mechanism for indirect information propagation, avoiding selection bias and information loss [40][42]. - The method allows for exponential context length expansion, surpassing the limitations of structured state space models and traditional linear attention models [43][44]. Temporal Convolutional Network (TCN) - TCN replaces traditional feedforward networks, enhancing the model's ability to capture local and global patterns through stacked convolutional layers [35][37]. - The architecture allows for direct learning of spatial and positional information from input sequences, eliminating the need for explicit positional embeddings [37]. Future Directions - The article indicates that further insights into the core technologies, cutting-edge applications, and future plans of the BigBang-Proton framework will be shared in subsequent publications [46].
国产芯片厂商争相认领新版DeepSeek
21世纪经济报道· 2025-10-01 15:00
Core Viewpoint - The release of DeepSeek-V3.2-Exp model by DeepSeek Company marks a significant advancement in the domestic AI chip ecosystem, showcasing a collaborative effort among various domestic chip manufacturers [1][4][7]. Group 1: Model Release and Features - DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention, which significantly reduces computational resource consumption and enhances inference efficiency [1][7]. - The new model has led to a price reduction of API services by 50% to 75% across DeepSeek's platforms [1]. - The model's release prompted immediate recognition and adaptation from several domestic chip manufacturers, including Cambrian, Huawei, and Haiguang [2][4]. Group 2: Industry Response and Ecosystem Development - Cambrian was the first to announce compatibility with DeepSeek-V3.2-Exp, followed by Huawei and Haiguang, indicating a rapid response from the industry [2][4]. - The consensus within the domestic AI industry regarding DeepSeek's models has enabled the company to take the lead in defining standards for domestic chips [4][7]. - The rapid adaptation of DeepSeek's models by various manufacturers suggests a growing synergy within the domestic AI hardware and software ecosystem [9]. Group 3: Future Implications - Experts believe that the swift development of domestic chips by 2025 can be attributed to the emergence of DeepSeek as a key player in the industry [4][5]. - The collaborative efforts among domestic companies to adapt to DeepSeek's standards may accelerate the growth of the AI chip ecosystem in China [4][9]. - The advancements made by DeepSeek in a short time frame highlight the potential for rapid evolution in the domestic AI landscape, contrasting with the decades-long establishment of ecosystems by companies like NVIDIA [9].
DeepSeek 与国产芯片开启“双向奔赴”
Core Insights - DeepSeek company has released the DeepSeek-V3.2-Exp model, introducing a sparse attention mechanism that significantly reduces computational resource consumption and enhances inference efficiency [1] - The new model has led to a price reduction of API services by 50% to 75% [1] - The release has prompted immediate recognition and adaptation from several domestic chip manufacturers, indicating a growing synergy within the domestic AI hardware and software ecosystem [1][2] Group 1: Model Release and Features - The DeepSeek-V3.2-Exp model incorporates the DeepSeek Sparse Attention mechanism, optimizing training and inference efficiency for long texts [5] - The model is compatible with CUDA and utilizes TileLang for rapid prototyping, aiming for higher efficiency through lower-level language implementations [5][6] - The release of V3.2-Exp marks a significant shift from the previous version, V3.1, which did not receive any proactive recognition from companies regarding the "UE8M0 floating-point format" [4][5] Group 2: Industry Response and Ecosystem Development - Within four minutes of the model's release, Cambricon announced its adaptation of the DeepSeek-V3.2-Exp model and open-sourced its large model inference engine [2] - Huawei and Haiguang also quickly followed suit, demonstrating the rapid response from the domestic chip industry to the new model [2] - The consensus within the domestic AI industry regarding the DeepSeek model has empowered the company to take the lead in defining standards for domestic chips [3][4] Group 3: Competitive Landscape - The rapid development of the domestic chip ecosystem is highlighted by the swift adaptation of major players like Tencent and Alibaba, who are actively integrating domestic chips into their cloud computing services [6] - Experts believe that the emergence of DeepSeek has accelerated the pace of domestic chip development, with expectations for significant advancements by 2025 [3]
AI日报丨再套现超4000万美元!黄仁勋持续减持英伟达,看好OpenAI称其或成为下一个万亿美元巨头
美股研究社· 2025-09-30 12:06
Core Insights - The article discusses the rapid advancements in artificial intelligence (AI) technology and its implications for investment opportunities in AI-related companies and market trends [3]. Group 1: AI Model Developments - The latest GLM-4.6 model by Zhiyuan has been launched, showing a 27% improvement in coding capabilities compared to its predecessor GLM-4.5, excelling in real programming tasks [5]. - DeepSeek introduced a "sparse attention" mechanism in its experimental AI model, DeepSeek-V3.1-Exp, aimed at enhancing training and inference efficiency in long contexts [5]. - Anthropic released its new AI model, Claude Sonnet 4.5, claiming it to be the "best coding model globally," with significant improvements in reliability and performance across various professional fields [6]. Group 2: Market Trends and Predictions - OpenAI has launched an "Instant Checkout" feature in ChatGPT, allowing users to purchase items directly through the platform, initially supporting single-item purchases [7]. - NVIDIA's CEO Jensen Huang sold 225,000 shares of NVIDIA stock for over $40 million, expressing confidence in AI's future, particularly in OpenAI's potential to become a trillion-dollar company [7][8]. - Huang predicts that OpenAI could achieve unprecedented growth, similar to other tech giants like Meta and Google, by offering both consumer and enterprise services [8]. Group 3: Copyright and Content Usage - OpenAI's Sora AI video generator will default to using copyrighted content, with an option for studios to opt-out, indicating a shift in content usage policies [12]. - The company has been in discussions with talent agencies and studios regarding the opt-out mechanism, ensuring that copyrighted characters do not appear in its AI tools [13].
DeepSeek,与国产芯片开启“双向奔赴”
Core Insights - DeepSeek company has released the DeepSeek-V3.2-Exp model, introducing a sparse attention mechanism that significantly reduces computational resource consumption and enhances inference efficiency [1][6] - The new model has led to a price reduction of API services by 50% to 75% [1] - The release has prompted immediate recognition and adaptation from several domestic chip manufacturers, including Huawei, Cambricon, and Haiguang, indicating a growing synergy within the domestic AI hardware and software ecosystem [2][4] Summary by Sections Model Release and Features - The DeepSeek-V3.2-Exp model incorporates the DeepSeek Sparse Attention mechanism, optimizing training and inference efficiency for long texts [6] - The model is compatible with CUDA and utilizes TileLang for rapid prototyping, which is designed specifically for AI operator development [6] Industry Response - Cambricon was the first to claim adaptation of the new model, followed by Huawei and Haiguang, showcasing a collaborative effort among domestic manufacturers [2] - The rapid response from these companies indicates a consensus within the domestic AI industry regarding the significance of the DeepSeek model [6] Ecosystem Development - DeepSeek is emerging as a key player in building a new ecosystem for domestic AI, with its model becoming a benchmark for open-source models in China [2][4] - The collaboration among major internet companies like Tencent and Alibaba in adapting domestic chips further accelerates the establishment of this ecosystem [7] Historical Context - The previous version, DeepSeek-V3.1, did not receive any proactive claims from companies regarding its adaptation, highlighting the significant shift in industry dynamics with the latest release [5] - Experts believe that the rapid development of domestic chips by 2025 can be attributed to the emergence of DeepSeek as a standard-setting entity [3]
华为昇腾、寒武纪宣布适配DeepSeek最新模型
Core Insights - DeepSeek officially launched the DeepSeek-V3.2-Exp model on September 29, introducing the self-developed DeepSeek Sparse Attention (DSA) mechanism, which optimizes training and inference efficiency for long texts [1][7] - The release of the new model has led to a significant reduction in service costs, with DeepSeek API prices dropping by over 50% [2][10] - The open-sourcing of the TileLang version operator has garnered considerable attention within the industry [3] Technical Innovations - The DSA mechanism is an optimization technique for the Transformer architecture, addressing the computational complexity associated with traditional dense attention mechanisms, which grow exponentially with text length [6][7] - The V3.2-Exp model has achieved substantial improvements in training and inference efficiency for long texts while maintaining performance levels comparable to the previous V3.1-Terminus model [7] Market Impact - DeepSeek has made the V3.2-Exp model fully open-source on platforms like HuggingFace and ModelScope, with related research papers also published [5] - The collaboration with domestic hardware providers such as Huawei, Cambricon, and Haiguang demonstrates the synergy between AI software and hardware ecosystems in China [11][12] - The adoption of TileLang, a programming language designed to simplify GPU operator development, is expected to enhance the efficiency of AI operator development significantly [12]
华为昇腾、寒武纪宣布适配DeepSeek最新模型
21世纪经济报道· 2025-09-30 10:13
Core Viewpoint - DeepSeek has officially released the V3.2-Exp model, introducing the DeepSeek Sparse Attention (DSA) mechanism, which optimizes training and inference efficiency for long texts, significantly reducing service costs by over 50% for the DeepSeek API [1][5]. Group 1: Model Development - The V3.2-Exp model builds on the V3.1-Terminus version and incorporates the DSA mechanism, which is a sparse attention approach that reduces computational complexity when processing long texts [1][4]. - DSA allows for adaptive selection of key attention heads and local context windows, improving efficiency and lowering costs compared to traditional dense attention mechanisms [3][4]. Group 2: Cost and Accessibility - The introduction of the new model has led to a significant reduction in the cost of accessing the DeepSeek API, with prices dropping by more than 50% [5]. - DeepSeek has temporarily retained additional API access for the previous V3.1-Terminus model until October 15, allowing users to conduct comparative testing [2]. Group 3: Open Source and Community Engagement - DeepSeek has fully open-sourced the V3.2-Exp model on platforms like HuggingFace and ModelScope, along with related research papers [2]. - The company has also open-sourced the TileLang version of the operators, which has garnered significant attention in the industry [1][6]. Group 4: Hardware Compatibility - Following the release of V3.2-Exp, major domestic hardware companies like Huawei, Cambricon, and Haiguang have announced compatibility with the new model, indicating a collaborative development within the domestic AI ecosystem [6][10]. - TileLang, a programming language developed for simplifying GPU operator development, has been recommended for use in research experiments, enhancing the efficiency of AI operator development [7][10].