DeepSeek V3
Search documents
DeepSeek更新后被吐槽变冷变傻:比20年前的青春伤感文学还尴尬
Mei Ri Jing Ji Xin Wen· 2026-02-12 22:23
Core Insights - DeepSeek has initiated a gray testing phase for its flagship model, allowing for a context length of up to 1 million tokens, significantly expanding from the previous 128K tokens in version V3.1 released in August last year [1] - Users have reported mixed reactions to the recent updates, with some expressing dissatisfaction over the model's change in tone and interaction style, leading to a trending topic on social media regarding its perceived coldness [1][4] Group 1: Model Updates and Features - The latest version of DeepSeek supports the processing of extremely long texts, as demonstrated by its ability to handle a document with over 240,000 tokens [1] - The upcoming DeepSeek V4 model is expected to be released in mid-February 2026, with the current version being a speed-optimized variant that sacrifices some quality for performance testing [6] - DeepSeek's V series models are designed for optimal performance, with V3 marking a significant milestone due to its efficient MoE architecture [6] Group 2: User Feedback and Reactions - Users have criticized the new version for its impersonal approach, referring to users as "users" instead of personalized nicknames, which has led to a perception of the model being less engaging [4] - Some users have described the updated model as overly simplistic and lacking emotional depth, comparing its output unfavorably to older literary styles [4] - Conversely, a segment of users appreciates the model's newfound objectivity and rationality, noting that it appears more attuned to the psychological state of the questioner [5] Group 3: Technical Innovations - DeepSeek has introduced two innovative architectures: mHC for optimizing information flow in deep Transformers, enhancing stability and scalability without increasing computational load, and Engram for decoupling static knowledge from dynamic computation [7] - These innovations aim to significantly reduce the cost of long-context reasoning while maintaining performance [7]
核心AI场景首超英伟达,一场国产算力的“破局叙事”|甲子光年
Xin Lang Cai Jing· 2026-01-29 12:12
中国算力的增长新范式。 编辑|栗子 在AI算力的深海里,沉默往往预示着更剧烈的爆发。 1月26日,距离国产AI算力企业天数智芯(09903.HK)登陆港股仅仅过去18天,这家在外界看来一贯低调的企业就对外抛出了一颗"重磅炸弹":一份敢于 将超越国际巨头Hopper、Blackwell乃至Rubin的具体日期写进日历的四代架构路线图。 图片来源:天数智芯 更重要的是,这并非画饼。事实上,天数智芯2025年推出的天数天枢架构,在DeepSeek V3这种关键的大模型场景上,已经交出了实测性能领先英伟达 Hopper约20%的成绩单,成为首个实现对国际主流架构实质性超越的国产方案。 过去七年,天数智芯选择了一条最慢也最难的路:全栈自研、深耕行业、死磕落地。这种"笨功夫"在物理AI爆发的前夜,终于汇聚成了突破性的质变信 号。 当算力竞争从"量的堆砌"转向"质的较量",天数智芯用超300家客户、超1000次部署的实战答卷证明:国产算力不再是实验室里的备选方案,而是在千亿 市场的浪潮中,正以"现在完成时"的成果为"将来时"的承诺背书,重构着属于中国算力的增长新范式。 1.算力进化的"中国路径" 通用GPU的底层架构是算力 ...
中国AI“三杰”同日轰炸,召唤百个Agent的门票终于发到每个人手里
Guan Cha Zhe Wang· 2026-01-28 09:37
Core Insights - The AI industry in China witnessed a significant event on January 27, with major updates from leading open-source projects like DeepSeek, Tongyi Qianwen, and Yuezhianmian, but Kimi K2.5 captured the most attention, surpassing 17,000 mentions online, even outpacing OpenAI's Prism [1][3] Group 1: Kimi K2.5 Features - Kimi K2.5 introduces native multimodal capabilities, allowing the model to understand visual inputs directly integrated with its language and coding abilities, fundamentally changing product development processes [11][14] - The model can generate complete HTML, CSS, and JS code from simple sketches or even rough doodles, significantly reducing the time and effort required for web development [11][14] - Kimi K2.5's dynamic understanding capability allows it to replicate complex interactive features from competitor websites, enhancing its utility beyond simple image recognition [13][14] Group 2: Efficiency and Productivity - The introduction of the Agent Swarm architecture enables Kimi to act as a project manager, coordinating multiple AI agents to handle complex tasks simultaneously, drastically improving efficiency [17][19] - In large-scale search scenarios, the Agent Swarm can reduce the number of key steps needed to achieve goals by 3 to 4.5 times, with actual processing time potentially shortened by up to 4.5 times [19][20] - Kimi's capabilities can be integrated into existing workflows, such as Excel and Word, allowing for significant time savings in data processing tasks [20][21] Group 3: Business Model Transformation - The release of Kimi K2.5 signifies a shift from software sales to service delivery, positioning companies like Yuezhianmian to provide direct solutions rather than just tools [22][23] - The cost of deploying a large-scale AI agent team is high, making cloud services more appealing for businesses compared to self-deployment, thus creating a profitable business model for Yuezhianmian [23] - Kimi's subscription model offers significant cost savings for companies, as it can perform the work of a junior engineer at a fraction of the cost, leading to a potential shift in budget allocations [23] Group 4: Future Implications - The evolution of AI from tools to coworkers indicates a fundamental change in how businesses will operate, with the potential to redefine productivity and organizational structures [24][26] - Kimi's advancements suggest that the ultimate value of technology lies in its ability to empower individuals, expanding their capabilities and imagination [26][27]
这家国产GPU用七年深蹲,交出一份敢写日期的路线图
是说芯语· 2026-01-27 23:31
这份路线图,或许正是那块厚实跳板 第一次发出的、清晰可闻的蓄力声。 当大部分同行还在用"对标"作为宣传话术时,这家公司直接把超越Hopper、Blackwell、Rubin的时间 点写进了2025-2027年的日历。底气何来?答案是:过去七年没走捷径。 在AI芯片发布会"人均对标英伟达"的语境下,一份真正敢把超越日期写进标题的架构路线图,本身就 构成了新闻。 1月26日,天数智芯公布了其2025至2027年的四代架构路线图。与寻常的未来可期不同,它像一份产 品手册般精确: 2025年天数天枢超Hopper,2026年天数天璇架构对标Blackwell,同样在2026年,天 数天玑架构超越Blackwell,2027年天数天权超Rubin ,2027年之后将转向突破性计算芯片架构设计。 未来3年,天数智芯将基于此次发布的四代架构,陆续发布多款产品,持续提升计算性能。 更不同寻常的是,这份这份未来宣言的第一行,已被验证。在天数智芯路线图中的最近一代架构—— 2025年推出的天枢,在关键的大模型场景——DeepSeek V3上,平均性能已比英伟达Hopper架构高出 约20%。 "他们不是在预测未来,他们是在汇报进 ...
明年超越英伟达Rubin?400亿国产GPU大消息
Zhong Guo Ji Jin Bao· 2026-01-26 15:20
Group 1 - The core viewpoint of the article is that TianShu ZhiXin has released a roadmap for its fourth-generation architecture, aiming to surpass NVIDIA's related products [2][3] - The roadmap includes the TianShu TianShu architecture expected to surpass Hopper by 2025, TianShu TianXuan architecture targeting Blackwell in 2026, TianShu TianJi architecture also surpassing Blackwell in 2026, and TianShu TianQuan architecture expected to surpass Rubin by 2027 [3] - The company plans to launch multiple products over the next three years based on this roadmap, continuously enhancing computing performance [3] Group 2 - Key details of the architectures include: TianShu TianShu architecture achieving over 90% effective utilization efficiency in AI calculations, TianShu TianXuan architecture introducing ixFP4 precision support, TianShu TianJi architecture covering all scenarios of AI and accelerated computing, and TianShu TianQuan architecture incorporating more precision support and innovative designs [3] - The company has reportedly achieved approximately 20% higher performance than the Hopper architecture in the DeepSeek V3 scenario due to core technology innovations [3] - TianShu ZhiXin was listed on the Hong Kong Stock Exchange on January 8, 2026, and its stock price closed at 188.2 HKD per share on January 26, 2026, reflecting a decline of 7.65%, with a market capitalization of 47.8 billion HKD [4]
2027年超越Rubin:这家国产GPU用七年深蹲,交出一份敢写日期的路线图
36氪· 2026-01-26 11:16
"他们不是在预测未来,他们是在汇报进度。"一位与会者说。 这份路线图,或许正是那块厚实跳板 第一次发出的、清晰可闻的蓄力声。 当大部分同行还在用"对标"作为宣传话术时,这家公司直接把超越Hopper、Blackwell、Rubin的时间点写进了2025-2027年的日历。底气何来?答案是:过 去七年没走捷径。 在AI芯片发布会"人均对标英伟达"的语境下,一份真正敢把超越日期写进标题的架构路线图,本身就构成了新闻。 1月26日,天数智芯公布了其2025至2027年的四代架构路线图。与寻常的未来可期不同,它像一份产品手册般精确: 2025年天数天枢超Hopper,2026年天 数天璇架构对标Blackwell,同样在2026年,天数天玑架构超越Blackwell,2027年天数天权超Rubin ,2027年之后将转向突破性计算芯片架构设计。未来3 年,天数智芯将基于此次发布的四代架构,陆续发布多款产品,持续提升计算性能。 更不同寻常的是,这份这份未来宣言的第一行,已被验证。在天数智芯路线图中的最近一代架构——2025年推出的天枢,在关键的大模型场景—— DeepSeek V3上,平均性能已比英伟达Hopper架构 ...
一个被忽视的Prompt技巧,居然是复制+粘贴。
数字生命卡兹克· 2026-01-22 03:09
Core Viewpoint - The article discusses a technique from a Google paper that shows how repeating prompts can significantly improve the accuracy of non-reasoning large language models (LLMs) from 21.33% to 97.33% [1][7]. Group 1: Experiment Overview - Google conducted experiments using seven popular non-reasoning models, including Gemini 2.0 Flash, GPT-4o, and Claude 3, to test the effectiveness of prompt repetition [13]. - The results indicated that this simple technique won 47 out of 70 tests, with no failures, demonstrating a clear performance improvement across all tested models [25]. Group 2: Mechanism of Improvement - The improvement is attributed to the nature of causal language models, which predict words sequentially. By repeating the prompt, the model can "look back" at the previous context, enhancing its understanding [28][30]. - This technique allows the model to have a second chance to process the information, leading to better accuracy in responses [39][40]. Group 3: Implications for Prompt Engineering - The article suggests that for many straightforward Q&A scenarios, simply repeating the question can be a powerful optimization strategy, rather than relying on complex prompt structures [50]. - Future directions mentioned in the paper include integrating this repetition technique into the training process of models, which could further enhance their performance [52].
传DeepSeek曝新模型,梁文锋再放“王炸”?
Xin Lang Cai Jing· 2026-01-21 07:55
Core Insights - DeepSeek has generated significant buzz in the AI community with the unexpected exposure of a new model named Model1 during a code update, suggesting a potential new technological path distinct from the existing V3 series [1][6][8] - Speculation is rife that DeepSeek is preparing to launch its next-generation AI model, V4, around mid-February, following a year of iterative improvements to the V3 model [3][8] Model Development Timeline - On March 25, 2025, DeepSeek released V3-0324, enhancing code generation usability and surpassing GPT-4.5 in mathematical and coding capabilities [4] - On May 29, 2025, the R1 model underwent a minor upgrade, improving performance in mathematics, programming, and general logic, with hallucination rates reduced by 45-50% [4] - On August 21, 2025, DeepSeek V3.1 was launched, offering faster response times and stronger agent capabilities, along with support for Anthropic's API [4] - On September 22, 2025, the V3.1-Terminus version was released, addressing issues with mixed-language inputs and enhancing the performance of Code and Search Agents [4] - On September 29, 2025, the V3.2-Exp version introduced a new attention mechanism, with updated API pricing structures [4] - On December 1, 2025, the official V3.2 version was released, achieving inference capabilities comparable to GPT-5 and integrating thinking modes for tool usage [4][9] Research Contributions - Two papers authored by Liang Wenfeng were published between late December 2025 and early January 2026, addressing training stability and knowledge retrieval efficiency in large model architectures [5][10] - The first paper proposed a manifold-constrained hyper-connections framework to enhance training stability by constraining residual connections within a specific manifold [10][11] - The second paper introduced a conditional memory module that improves inference and knowledge task performance by decoupling knowledge storage from neural computation [10][11] Market Expectations - The AI community is eagerly anticipating whether DeepSeek will unveil the new Model1 or V4 during the upcoming Spring Festival, with expectations of a significant impact on the global AI landscape [6][8]
DeepSeek新模型曝光
财联社· 2026-01-21 06:34
Core Viewpoint - DeepSeek is advancing its AI model capabilities with the introduction of MODEL1, which is designed for efficient inference and optimized for various GPU architectures, indicating a strategic focus on enhancing performance and reducing memory usage in AI applications [4][5][6]. Group 1: MODEL1 and FlashMLA - MODEL1 is a newly revealed model architecture within DeepSeek's FlashMLA, which is a software tool optimized for NVIDIA Hopper architecture GPUs, aimed at accelerating large model inference generation [4]. - FlashMLA utilizes a multi-layer attention mechanism (MLA) to minimize memory usage and maximize GPU hardware efficiency, which is crucial for the performance of DeepSeek's models [4][5]. - MODEL1 is expected to be a low-memory consumption model suitable for edge devices and cost-sensitive scenarios, with optimizations for long sequence tasks such as document understanding and code analysis [5]. Group 2: DeepSeek's Model Development - DeepSeek's existing models represent two technical routes: the V series focusing on comprehensive performance and the R series targeting complex reasoning tasks [6]. - The V3 model, launched in December 2024, marked a significant milestone with its efficient MoE architecture, followed by rapid iterations leading to V3.1 and V3.2, which enhance reasoning and agent capabilities [6]. - The R1 model, released in January 2025, excels in solving complex reasoning tasks through reinforcement learning and introduces a "deep thinking" mode, showcasing DeepSeek's commitment to advancing AI capabilities [7]. Group 3: Future Developments - DeepSeek is expected to launch its next flagship AI model, DeepSeek V4, around mid-February 2025, which is anticipated to have enhanced coding capabilities [7]. - Recent technical papers from DeepSeek discuss new training methods and an AI memory module inspired by biology, suggesting that these innovations may be integrated into upcoming models [7].
R1模型发布一周年 DeepSeek新模型“MODEL1”曝光
Xin Lang Cai Jing· 2026-01-21 04:05
Core Insights - DeepSeek has unveiled a new model architecture named "MODEL1" as part of its FlashMLA software, which is designed to optimize large model inference generation on NVIDIA GPUs [1][2] - MODEL1 is expected to be a highly efficient inference model with lower memory usage compared to the existing V3.2 model, making it suitable for edge devices and cost-sensitive applications [2] - The company is set to launch its next flagship AI model, DeepSeek V4, in mid-February 2025, which is anticipated to enhance coding capabilities [3] Group 1 - The FlashMLA tool analyzes a total of 114 code files and identifies the MODEL1 architecture mentioned 31 times [1] - MODEL1 supports multiple GPU architectures, including specific implementations for NVIDIA H100/H200 and B200, indicating a tailored optimization for the latest GPU technology [2] - DeepSeek's existing models represent two technical routes: the V series focusing on comprehensive performance and the R series targeting complex reasoning tasks [2] Group 2 - The V3 model, launched in December 2024, established a strong performance foundation with its efficient MoE architecture, followed by rapid iterations leading to V3.2 [3] - The R1 model, released in January 2025, excels in complex reasoning tasks through reinforcement learning and introduces a "deep thinking" mode [3] - Recent technical papers from DeepSeek suggest ongoing development of new models that may integrate innovative training methods and AI memory modules [3]