Workflow
稀疏注意力机制
icon
Search documents
腾讯研究院AI速递 20260114
腾讯研究院· 2026-01-13 16:29
Group 1 - Anthropic has launched an AI office tool called Cowork, designed to automate daily tasks such as document creation, planning, data analysis, and file organization [1] - Cowork features proactive and autonomous capabilities, allowing it to create plans and sync progress in real-time, and integrates with external information sources and Chrome [1] - The development of Cowork took only a week and a half, with 100% of the code written by Claude Code, ensuring user control and the ability to halt operations at any time [1] Group 2 - Apple has announced a partnership with Google to develop the next generation of its foundational model based on Gemini, which will also overhaul Siri [2] - The Apple AI team has experienced significant talent loss, with dozens of core members leaving, making collaboration with Google a necessary choice due to Gemini's 1.2 trillion parameters compared to Apple's 150 billion [2] - Google processes 13 trillion tokens monthly, and Gemini has captured over 20% of the global market share, while Elon Musk criticized the concentration of power in this partnership [2] Group 3 - DeepSeek has introduced a new paper proposing a conditional memory module called Engram, which complements MoE conditional computation and addresses the lack of native knowledge retrieval in Transformers [3] - Engram significantly outperforms pure MoE baselines, with improvements in MMLU by 3.4, BBH by 5.0, and HumanEval by 3.0, while increasing long-context retrieval accuracy from 84.2% to 97.0% [3] - The upcoming DeepSeek V4 is becoming clearer, with conditional memory expected to be a core modeling primitive for the next generation of sparse large models [3] Group 4 - OpenAI has acquired AI healthcare startup Torch for approximately $100 million, with $60 million paid upfront and the remainder for employee retention incentives [4] - Torch integrates with healthcare systems like Kaiser Permanente and Apple Health, allowing for unified access to lab results, prescriptions, and medical records, while using AI for classification and health insights [4] - The founding team of Torch has joined OpenAI to develop the ChatGPT Health module, following their previous experience with an online clinic platform [4] Group 5 - Anthropic has launched HIPAA-compliant AI services for healthcare, enabling institutions and individuals to process protected health data while referencing authoritative databases [6] - Claude can export personal health data from applications like Apple Health for aggregation and understanding, with a commitment not to use any medical user data for model training [6] - Over 22,000 clinical service providers from Banner Health are using Claude, with 85% reporting increased work efficiency, and collaborations with major healthcare institutions are underway [6] Group 6 - Baichuan has released the open-source medical model M3, achieving a top score of 65.1 in HealthBench and winning the Hard category with a score of 44.4, surpassing GPT-5.2 [7] - M3 introduces native end-to-end serious inquiry capabilities, following the SCAN principles, and demonstrates superior inquiry abilities compared to average human doctors [7] - M3 employs a dynamic Verifier System and a new SPAR algorithm to address long dialogue training issues, with applications already integrated for doctors and patients [7] Group 7 - OpenAI is set to produce a special audio product called "Sweetpea," designed to replace AirPods, with mass production planned by Foxconn by Q4 2028 [8] - The device, designed by Jony Ive's team, features a metallic design resembling a pebble and includes two capsule-like units for behind-the-ear wear, with a focus on local AI processing [8] - The product is expected to launch in September 2026, with an estimated first-year shipment of 40-50 million units, allowing users to control functions via commands instead of an iPhone [8] Group 8 - Meituan has introduced a new sparse attention mechanism called LoZA, replacing 50% of low-performance MLA modules with a streaming sparse attention structure [9] - The new mechanism improves decoding speed for 128K context by 10 times and preloading speed for 256K context by 50%, while reducing computational complexity to linear O(L·S) [9] - LoZA can be implemented without retraining from scratch, featuring a design that balances local detail and overall logic within sparse windows [9] Group 9 - MIT Technology Review has released its list of the top ten breakthrough technologies for 2026, including large-scale AI data centers, sodium-ion batteries, base editing, and advanced nuclear reactors [10][11] - The report highlights the significant energy consumption of large-scale data centers and the successful application of sodium-ion batteries in specific vehicle models [11] - It emphasizes the shift in AI development focus from "what can be done" to "what should be done," with ethical considerations becoming a central theme in life sciences [11] Group 10 - The CEO of Fal platform revealed that generating a 5-second 24-frame video consumes 12,000 times the computational power of generating 200 tokens of text, with 4K resolution requiring ten times more [12] - The platform supports over 600 generative media models, with top clients using an average of 14 different models simultaneously, indicating a trend towards scaling AI-generated content [12] - The discussion suggests that as content generation becomes limitless, finite intellectual property will gain more value, with education and personalized advertising identified as promising application areas [12]
在这个开源「从夯到拉」榜单,我终于明白中国 AI 为什么能逆袭
Xin Lang Cai Jing· 2025-12-17 14:25
Core Insights - The recent ranking of open-source AI models highlights the dominance of Chinese models, with DeepSeek, Qwen, Kimi, Zhipu, and MiniMax leading the global landscape, while OpenAI and Meta's models lag behind [3][5][25]. Group 1: Performance and Market Position - Chinese open-source models are rapidly closing the performance gap with closed-source giants, excelling in dimensions such as performance, pricing, ecosystem, and usability [5][25]. - Kimi's K2 Thinking model, featuring a trillion parameters, has outperformed OpenAI's GPT-5 and Anthropic's Claude 4.5 in various benchmarks [11][14]. - MiniMax M2 has also shown strong performance, ranking fifth in comprehensive lists, surpassing competitors like Gemini 2.5 Pro and Claude Opus 4.1 [14][79]. Group 2: Technological Advancements - The introduction of interleaved thinking in models like MiniMax M2 and Kimi K2 Thinking allows for more efficient task execution by alternating between action and reflection [34][36]. - MiniMax M2 employs a full attention mechanism, which, despite increasing training and inference demands, has proven to deliver better performance compared to sparse attention models [75][78]. Group 3: Cost and Accessibility - MiniMax's API offers competitive pricing at $0.3/$1.2 per million input/output tokens, although its verbose nature leads to high token usage, which can offset cost advantages [79]. - The open-source movement in China is gaining momentum, with MiniMax's release reinforcing the leadership established by DeepSeek and other Chinese AI labs in the open-source domain [80][84]. Group 4: Community and Developer Adoption - There is a growing recognition among developers for the practicality and affordability of Chinese open-source models, with many citing them as preferable alternatives to established closed-source options like OpenAI [25][84]. - The rapid updates and releases from various Chinese companies indicate a robust and collaborative open-source ecosystem that is continuously evolving [11][14].
DeepSeek V3.2发布!实测效果惊艳,便宜是最大优势
3 6 Ke· 2025-12-03 03:57
Core Insights - DeepSeek has launched its V3.2 version, which reportedly matches the inference capabilities of OpenAI's GPT-5 while being significantly cheaper [1][22] - The V3.2 version includes two variants: a free version for users and a Speciale version that supports API access, which boasts enhanced reasoning capabilities [2][22] Performance Enhancements - DeepSeek V3.2-Speciale has demonstrated superior performance in various competitions, achieving gold medal results in IMO 2025, CMO 2025, ICPC World Finals 2025, and IOI 2025, outperforming GPT-5 High in all tests [4][22] - The introduction of the DeepSeek Sparse Attention (DSA) mechanism has fundamentally improved the efficiency of attention in AI models, reducing computational costs by over 60% and increasing inference speed by approximately 3.5 times [6][12] Cost Efficiency - The DSA mechanism allows for a significant reduction in the cost of processing long sequences, with costs dropping from $0.7 to $0.2 per million tokens during the pre-fill phase and from $2.4 to $0.8 during the decoding phase [12][22] - This cost reduction positions DeepSeek V3.2 as one of the most affordable models for long-text inference in its category [12][22] Tool Utilization - DeepSeek V3.2 allows the AI model to call tools during its reasoning process without requiring additional training, enhancing its general performance and compatibility with user-created tools [13][22] - The model demonstrates the ability to break down complex tasks and utilize different tools effectively, showcasing its decision-making capabilities [20][22] Market Impact - The release of DeepSeek V3.2 challenges the notion that open-source models lag behind closed-source counterparts, as it offers competitive performance at a fraction of the cost [22][23] - The DSA mechanism's cost revolution is expected to significantly impact the commercialization of AI models, making advanced AI applications more accessible to smaller enterprises and consumers [22][23]
DeepSeek又上新!模型硬刚谷歌
第一财经· 2025-12-01 14:05
2025.12. 01 两款模型有着不同的定位。DeepSeek-V3.2的目标是平衡推理能力与输出长度,适合日常使用,例 如问答场景和通用智能体任务场景。9月底DeepSeek发布了实验版V3.2-Exp,此次是正式版更 新。在公开推理测试中,V3.2达到了GPT-5的水平,仅略低于谷歌的Gemini3 Pro。 本文字数:1580,阅读时长大约3分钟 作者 | 第一财经 刘晓洁 12月1日晚,DeepSeek又上新了两款新模型,DeepSeek-V3.2和DeepSeek-V3.2-Speciale, 在推理能力上全球领先。 据DeepSeek公布的数据,Speciale在多个推理基准测试中超越谷歌最先进的Gemini3 Pro。具体 来看,在美国数学邀请赛、哈佛MIT数学竞赛、国际奥林匹克数学竞赛等测试中,V3.2-Speciale都 超过了Gemini3 Pro,但在编程、理工科博士生测试中略逊于谷歌。 DeepSeek-V3.2-Speciale则是此次的重头戏,其目标是"将开源模型的推理能力推向极致,探索 模型能力的边界"。据介绍,Speciale是V3.2的长思考增强版,同时结合了DeepSee ...
AGI 新技术路线:下一代稀疏注意力机制 Monte Carlo Attention 开源
AI科技大本营· 2025-11-10 01:03
Core Viewpoint - The article discusses the innovative Monte Carlo Attention mechanism used in the BigBang-Proton framework, which allows for efficient modeling of extremely long contexts by leveraging a unique inter-patch delegation mechanism, achieving linear complexity while overcoming the limitations of traditional attention methods [1][4][32]. Context Length in Material World Modeling - Monte Carlo Attention was developed to meet the theoretical demands of the BigBang-Proton framework, addressing the need for extremely long context lengths due to the integration of diverse scientific data [2][3]. - The estimated total sequence length required for comprehensive virtual cell integration is approximately 10¹⁵ tokens, necessitating a context length far exceeding current large language models [2][3]. Monte Carlo Attention Mechanism - Monte Carlo Attention reduces computational complexity from O(L²) to O(L), significantly improving training efficiency and convergence rates [4]. - This mechanism allows for the training of sequences that are multiple orders of magnitude longer than the device memory capacity, promoting the development of next-generation hardware architectures [4][32]. BigBang-Proton Architecture Components - The BigBang-Proton architecture consists of three core components: Binary Patch Encoding, Monte Carlo Attention, and a Temporal Convolutional Network (TCN) [7][8]. - The inter-patch delegation mechanism enables local and global information exchange, allowing context length to grow exponentially with the number of layers while maintaining linear computational complexity [8][9]. Delegate Operation Process - The delegate operation is a hierarchical process involving the decomposition of input sequences into blocks, generating delegate tokens, distributing them, and enhancing local representations with global context [17][20][22]. - The complexity of attention calculations within each block is O(P²), while global information flow complexity is determined by the number of blocks [28][30]. Comparison with Existing Attention Mechanisms - Monte Carlo Attention differs fundamentally from sparse attention methods by utilizing a reorganization-based mechanism for indirect information propagation, avoiding selection bias and information loss [40][42]. - The method allows for exponential context length expansion, surpassing the limitations of structured state space models and traditional linear attention models [43][44]. Temporal Convolutional Network (TCN) - TCN replaces traditional feedforward networks, enhancing the model's ability to capture local and global patterns through stacked convolutional layers [35][37]. - The architecture allows for direct learning of spatial and positional information from input sequences, eliminating the need for explicit positional embeddings [37]. Future Directions - The article indicates that further insights into the core technologies, cutting-edge applications, and future plans of the BigBang-Proton framework will be shared in subsequent publications [46].
国产芯片厂商争相认领新版DeepSeek
21世纪经济报道· 2025-10-01 15:00
Core Viewpoint - The release of DeepSeek-V3.2-Exp model by DeepSeek Company marks a significant advancement in the domestic AI chip ecosystem, showcasing a collaborative effort among various domestic chip manufacturers [1][4][7]. Group 1: Model Release and Features - DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention, which significantly reduces computational resource consumption and enhances inference efficiency [1][7]. - The new model has led to a price reduction of API services by 50% to 75% across DeepSeek's platforms [1]. - The model's release prompted immediate recognition and adaptation from several domestic chip manufacturers, including Cambrian, Huawei, and Haiguang [2][4]. Group 2: Industry Response and Ecosystem Development - Cambrian was the first to announce compatibility with DeepSeek-V3.2-Exp, followed by Huawei and Haiguang, indicating a rapid response from the industry [2][4]. - The consensus within the domestic AI industry regarding DeepSeek's models has enabled the company to take the lead in defining standards for domestic chips [4][7]. - The rapid adaptation of DeepSeek's models by various manufacturers suggests a growing synergy within the domestic AI hardware and software ecosystem [9]. Group 3: Future Implications - Experts believe that the swift development of domestic chips by 2025 can be attributed to the emergence of DeepSeek as a key player in the industry [4][5]. - The collaborative efforts among domestic companies to adapt to DeepSeek's standards may accelerate the growth of the AI chip ecosystem in China [4][9]. - The advancements made by DeepSeek in a short time frame highlight the potential for rapid evolution in the domestic AI landscape, contrasting with the decades-long establishment of ecosystems by companies like NVIDIA [9].
DeepSeek 与国产芯片开启“双向奔赴”
Core Insights - DeepSeek company has released the DeepSeek-V3.2-Exp model, introducing a sparse attention mechanism that significantly reduces computational resource consumption and enhances inference efficiency [1] - The new model has led to a price reduction of API services by 50% to 75% [1] - The release has prompted immediate recognition and adaptation from several domestic chip manufacturers, indicating a growing synergy within the domestic AI hardware and software ecosystem [1][2] Group 1: Model Release and Features - The DeepSeek-V3.2-Exp model incorporates the DeepSeek Sparse Attention mechanism, optimizing training and inference efficiency for long texts [5] - The model is compatible with CUDA and utilizes TileLang for rapid prototyping, aiming for higher efficiency through lower-level language implementations [5][6] - The release of V3.2-Exp marks a significant shift from the previous version, V3.1, which did not receive any proactive recognition from companies regarding the "UE8M0 floating-point format" [4][5] Group 2: Industry Response and Ecosystem Development - Within four minutes of the model's release, Cambricon announced its adaptation of the DeepSeek-V3.2-Exp model and open-sourced its large model inference engine [2] - Huawei and Haiguang also quickly followed suit, demonstrating the rapid response from the domestic chip industry to the new model [2] - The consensus within the domestic AI industry regarding the DeepSeek model has empowered the company to take the lead in defining standards for domestic chips [3][4] Group 3: Competitive Landscape - The rapid development of the domestic chip ecosystem is highlighted by the swift adaptation of major players like Tencent and Alibaba, who are actively integrating domestic chips into their cloud computing services [6] - Experts believe that the emergence of DeepSeek has accelerated the pace of domestic chip development, with expectations for significant advancements by 2025 [3]
AI日报丨再套现超4000万美元!黄仁勋持续减持英伟达,看好OpenAI称其或成为下一个万亿美元巨头
美股研究社· 2025-09-30 12:06
Core Insights - The article discusses the rapid advancements in artificial intelligence (AI) technology and its implications for investment opportunities in AI-related companies and market trends [3]. Group 1: AI Model Developments - The latest GLM-4.6 model by Zhiyuan has been launched, showing a 27% improvement in coding capabilities compared to its predecessor GLM-4.5, excelling in real programming tasks [5]. - DeepSeek introduced a "sparse attention" mechanism in its experimental AI model, DeepSeek-V3.1-Exp, aimed at enhancing training and inference efficiency in long contexts [5]. - Anthropic released its new AI model, Claude Sonnet 4.5, claiming it to be the "best coding model globally," with significant improvements in reliability and performance across various professional fields [6]. Group 2: Market Trends and Predictions - OpenAI has launched an "Instant Checkout" feature in ChatGPT, allowing users to purchase items directly through the platform, initially supporting single-item purchases [7]. - NVIDIA's CEO Jensen Huang sold 225,000 shares of NVIDIA stock for over $40 million, expressing confidence in AI's future, particularly in OpenAI's potential to become a trillion-dollar company [7][8]. - Huang predicts that OpenAI could achieve unprecedented growth, similar to other tech giants like Meta and Google, by offering both consumer and enterprise services [8]. Group 3: Copyright and Content Usage - OpenAI's Sora AI video generator will default to using copyrighted content, with an option for studios to opt-out, indicating a shift in content usage policies [12]. - The company has been in discussions with talent agencies and studios regarding the opt-out mechanism, ensuring that copyrighted characters do not appear in its AI tools [13].
DeepSeek,与国产芯片开启“双向奔赴”
Core Insights - DeepSeek company has released the DeepSeek-V3.2-Exp model, introducing a sparse attention mechanism that significantly reduces computational resource consumption and enhances inference efficiency [1][6] - The new model has led to a price reduction of API services by 50% to 75% [1] - The release has prompted immediate recognition and adaptation from several domestic chip manufacturers, including Huawei, Cambricon, and Haiguang, indicating a growing synergy within the domestic AI hardware and software ecosystem [2][4] Summary by Sections Model Release and Features - The DeepSeek-V3.2-Exp model incorporates the DeepSeek Sparse Attention mechanism, optimizing training and inference efficiency for long texts [6] - The model is compatible with CUDA and utilizes TileLang for rapid prototyping, which is designed specifically for AI operator development [6] Industry Response - Cambricon was the first to claim adaptation of the new model, followed by Huawei and Haiguang, showcasing a collaborative effort among domestic manufacturers [2] - The rapid response from these companies indicates a consensus within the domestic AI industry regarding the significance of the DeepSeek model [6] Ecosystem Development - DeepSeek is emerging as a key player in building a new ecosystem for domestic AI, with its model becoming a benchmark for open-source models in China [2][4] - The collaboration among major internet companies like Tencent and Alibaba in adapting domestic chips further accelerates the establishment of this ecosystem [7] Historical Context - The previous version, DeepSeek-V3.1, did not receive any proactive claims from companies regarding its adaptation, highlighting the significant shift in industry dynamics with the latest release [5] - Experts believe that the rapid development of domestic chips by 2025 can be attributed to the emergence of DeepSeek as a standard-setting entity [3]
华为昇腾、寒武纪宣布适配DeepSeek最新模型
Core Insights - DeepSeek officially launched the DeepSeek-V3.2-Exp model on September 29, introducing the self-developed DeepSeek Sparse Attention (DSA) mechanism, which optimizes training and inference efficiency for long texts [1][7] - The release of the new model has led to a significant reduction in service costs, with DeepSeek API prices dropping by over 50% [2][10] - The open-sourcing of the TileLang version operator has garnered considerable attention within the industry [3] Technical Innovations - The DSA mechanism is an optimization technique for the Transformer architecture, addressing the computational complexity associated with traditional dense attention mechanisms, which grow exponentially with text length [6][7] - The V3.2-Exp model has achieved substantial improvements in training and inference efficiency for long texts while maintaining performance levels comparable to the previous V3.1-Terminus model [7] Market Impact - DeepSeek has made the V3.2-Exp model fully open-source on platforms like HuggingFace and ModelScope, with related research papers also published [5] - The collaboration with domestic hardware providers such as Huawei, Cambricon, and Haiguang demonstrates the synergy between AI software and hardware ecosystems in China [11][12] - The adoption of TileLang, a programming language designed to simplify GPU operator development, is expected to enhance the efficiency of AI operator development significantly [12]