DeepSeek V3.1
Search documents
DeepSeek更新后被吐槽变冷变傻:比20年前的青春伤感文学还尴尬
Mei Ri Jing Ji Xin Wen· 2026-02-12 22:23
Core Insights - DeepSeek has initiated a gray testing phase for its flagship model, allowing for a context length of up to 1 million tokens, significantly expanding from the previous 128K tokens in version V3.1 released in August last year [1] - Users have reported mixed reactions to the recent updates, with some expressing dissatisfaction over the model's change in tone and interaction style, leading to a trending topic on social media regarding its perceived coldness [1][4] Group 1: Model Updates and Features - The latest version of DeepSeek supports the processing of extremely long texts, as demonstrated by its ability to handle a document with over 240,000 tokens [1] - The upcoming DeepSeek V4 model is expected to be released in mid-February 2026, with the current version being a speed-optimized variant that sacrifices some quality for performance testing [6] - DeepSeek's V series models are designed for optimal performance, with V3 marking a significant milestone due to its efficient MoE architecture [6] Group 2: User Feedback and Reactions - Users have criticized the new version for its impersonal approach, referring to users as "users" instead of personalized nicknames, which has led to a perception of the model being less engaging [4] - Some users have described the updated model as overly simplistic and lacking emotional depth, comparing its output unfavorably to older literary styles [4] - Conversely, a segment of users appreciates the model's newfound objectivity and rationality, noting that it appears more attuned to the psychological state of the questioner [5] Group 3: Technical Innovations - DeepSeek has introduced two innovative architectures: mHC for optimizing information flow in deep Transformers, enhancing stability and scalability without increasing computational load, and Engram for decoupling static knowledge from dynamic computation [7] - These innovations aim to significantly reduce the cost of long-context reasoning while maintaining performance [7]
DeepSeek新模型曝光
财联社· 2026-01-21 06:34
Core Viewpoint - DeepSeek is advancing its AI model capabilities with the introduction of MODEL1, which is designed for efficient inference and optimized for various GPU architectures, indicating a strategic focus on enhancing performance and reducing memory usage in AI applications [4][5][6]. Group 1: MODEL1 and FlashMLA - MODEL1 is a newly revealed model architecture within DeepSeek's FlashMLA, which is a software tool optimized for NVIDIA Hopper architecture GPUs, aimed at accelerating large model inference generation [4]. - FlashMLA utilizes a multi-layer attention mechanism (MLA) to minimize memory usage and maximize GPU hardware efficiency, which is crucial for the performance of DeepSeek's models [4][5]. - MODEL1 is expected to be a low-memory consumption model suitable for edge devices and cost-sensitive scenarios, with optimizations for long sequence tasks such as document understanding and code analysis [5]. Group 2: DeepSeek's Model Development - DeepSeek's existing models represent two technical routes: the V series focusing on comprehensive performance and the R series targeting complex reasoning tasks [6]. - The V3 model, launched in December 2024, marked a significant milestone with its efficient MoE architecture, followed by rapid iterations leading to V3.1 and V3.2, which enhance reasoning and agent capabilities [6]. - The R1 model, released in January 2025, excels in solving complex reasoning tasks through reinforcement learning and introduces a "deep thinking" mode, showcasing DeepSeek's commitment to advancing AI capabilities [7]. Group 3: Future Developments - DeepSeek is expected to launch its next flagship AI model, DeepSeek V4, around mid-February 2025, which is anticipated to have enhanced coding capabilities [7]. - Recent technical papers from DeepSeek discuss new training methods and an AI memory module inspired by biology, suggesting that these innovations may be integrated into upcoming models [7].
R1模型发布一周年 DeepSeek新模型“MODEL1”曝光
Xin Lang Cai Jing· 2026-01-21 04:05
Core Insights - DeepSeek has unveiled a new model architecture named "MODEL1" as part of its FlashMLA software, which is designed to optimize large model inference generation on NVIDIA GPUs [1][2] - MODEL1 is expected to be a highly efficient inference model with lower memory usage compared to the existing V3.2 model, making it suitable for edge devices and cost-sensitive applications [2] - The company is set to launch its next flagship AI model, DeepSeek V4, in mid-February 2025, which is anticipated to enhance coding capabilities [3] Group 1 - The FlashMLA tool analyzes a total of 114 code files and identifies the MODEL1 architecture mentioned 31 times [1] - MODEL1 supports multiple GPU architectures, including specific implementations for NVIDIA H100/H200 and B200, indicating a tailored optimization for the latest GPU technology [2] - DeepSeek's existing models represent two technical routes: the V series focusing on comprehensive performance and the R series targeting complex reasoning tasks [2] Group 2 - The V3 model, launched in December 2024, established a strong performance foundation with its efficient MoE architecture, followed by rapid iterations leading to V3.2 [3] - The R1 model, released in January 2025, excels in complex reasoning tasks through reinforcement learning and introduces a "deep thinking" mode [3] - Recent technical papers from DeepSeek suggest ongoing development of new models that may integrate innovative training methods and AI memory modules [3]
AI大模型分野:从技术狂热到商业价值回归
Xin Lang Cai Jing· 2025-12-25 12:40
Core Insights - The Chinese large model market in 2025 has undergone a significant "value return," with diminishing marginal effects of technological breakthroughs and a shift towards sustainable business models and deep industry integration [2][11] - The emergence of DeepSeek has disrupted the existing large model market, temporarily dethroning ChatGPT and becoming a global phenomenon [3][12] - The competitive landscape is evolving from a binary narrative of "giants" versus "small tigers" to a more complex, multidimensional competitive stage [3][12] Company Developments - DeepSeek experienced a surge in popularity at the beginning of 2025 but faced a decline in attention by the second half of the year, with updates failing to generate significant market interest [4][13] - The "AI Six Tigers," including Zero One Everything and Baichuan Intelligence, have shifted focus from training large models to practical commercial applications [5][14] - Zero One Everything reported significant revenue growth in 2025, achieving multiple times the revenue of 2024, and successfully launched international projects [6][15] - Baichuan Intelligence has optimized its business focus towards the medical sector, indicating a strategic shift in resource allocation [6][15] Market Trends - The investment landscape has shifted from funding foundational model companies to prioritizing AI applications and infrastructure, reflecting a broader market demand for practical solutions [8][17] - Companies like Zhipu and MiniMax are moving towards IPOs, becoming the first independent large model firms to list in Hong Kong, which is expected to attract significant investor interest [18] - The focus on sustainable revenue growth and reduced losses will be critical for long-term success in the capital markets [18] Technological Insights - The current Transformer architecture may not support the next generation of agents, with research indicating a potential shift towards Non-Linear RNNs for improved performance in long-context environments [19]
AI大模型分野:从技术狂热到商业价值回归|2025中国经济年报
Hua Xia Shi Bao· 2025-12-25 08:16
Core Insights - The Chinese large model market in 2025 is undergoing a significant "value return," with a shift towards sustainable business models and real demand, marking it as a year of entrepreneurial opportunities in global AI applications [2] - DeepSeek emerged as a major player in early 2025, temporarily dethroning ChatGPT in app downloads and gaining widespread attention, but faced a decline in visibility later in the year [3][4] - The competitive landscape is evolving, with the "AI Six Tigers" diversifying their strategies, focusing on practical applications rather than large model training [5][6] Company Strategies - Zero One Wanhua and Baichuan Intelligence have shifted away from training large models, focusing instead on industry applications and commercial viability, achieving significant revenue growth in 2025 [6] - Jiepux and MiniMax are maintaining their focus on large model training while emphasizing commercialization, with Jiepux reporting extensive partnerships in key sectors [6][7] - Yuezhi Anmian is transitioning towards a market-driven approach, appointing a former investor as president to enhance its commercial strategy [7] Market Dynamics - The investment landscape is becoming more cautious, with investors preferring applications and infrastructure over foundational model companies, reflecting a shift in capital towards sectors that provide tangible value [8] - The trend is moving from financing to public offerings, with Jiepux and MiniMax preparing for IPOs, which could attract significant market attention due to the lack of pure AI listings in Hong Kong [9] - The future of AI is expected to see the emergence of "new species" capable of full-loop capabilities across industries, potentially disrupting traditional business models [9][10] Technical Developments - Current Transformer architectures may not support the next generation of agents, with research indicating a potential evolution towards Non-Linear RNNs to address limitations in handling long-context environments [10]
DeepSeek V3到V3.2的进化之路,一文看全
机器之心· 2025-12-08 04:27
Core Insights - DeepSeek has released two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which have generated significant interest and discussion in the AI community [2][5][11] - The evolution from DeepSeek V3 to V3.2 includes various architectural improvements and the introduction of new mechanisms aimed at enhancing performance and efficiency [10][131] Release Timeline - The initial release of DeepSeek V3 in December 2024 did not create immediate buzz, but the subsequent release of the DeepSeek R1 model changed the landscape, making DeepSeek a popular alternative to proprietary models from companies like OpenAI and Google [11][14] - The release of DeepSeek V3.2-Exp in September 2025 was seen as a preparatory step for the V3.2 model, focusing on establishing the necessary infrastructure for deployment [17][49] Model Types - DeepSeek V3 was initially launched as a base model, while DeepSeek R1 was developed as a specialized reasoning model through additional training [19][20] - The trend in the industry has seen a shift from hybrid reasoning models to specialized models, with DeepSeek seemingly reversing this trend by moving from specialized (R1) to hybrid models (V3.1 and V3.2) [25] Evolution from V3 to V3.1 - DeepSeek V3 utilized a mixed expert model and multi-head latent attention (MLA) to optimize memory usage during inference [29][30] - DeepSeek R1 focused on Reinforcement Learning with Verifiable Rewards (RLVR) to enhance reasoning capabilities, particularly in tasks requiring symbolic verification [37][38] Sparse Attention Mechanism - DeepSeek V3.2-Exp introduced a non-standard sparse attention mechanism, which significantly improved efficiency in training and inference, especially in long-context scenarios [49][68] - The DeepSeek Sparse Attention (DSA) mechanism allows the model to selectively focus on relevant past tokens, reducing computational complexity from quadratic to linear [68] Self-Verification and Self-Correction - DeepSeekMath V2, released shortly before V3.2, introduced self-verification and self-correction techniques to improve the accuracy of mathematical reasoning tasks [71][72] - The self-verification process involves a verifier model that assesses the quality of generated proofs, while self-correction allows the model to iteratively improve its outputs based on feedback [78][92] DeepSeek V3.2 Architecture - DeepSeek V3.2 maintains the architecture of its predecessor, V3.2-Exp, while incorporating improvements aimed at enhancing overall model performance across various tasks, including mathematics and coding [107][110] - The model's training process has been refined to include updates to the RLVR framework, integrating new reward mechanisms for different task types [115][116] Performance Benchmarks - DeepSeek V3.2 has shown competitive performance in various benchmarks, achieving notable results in mathematical tasks and outperforming several proprietary models [127]
信创模盒ModelHub XC|上线两个月模型适配破千 铸就国产AI算力与应用融合新基座
Ge Long Hui· 2025-11-27 03:12
Core Insights - The launch of "ModelHub XC" by Paradigm Intelligence has achieved over 1,000 model adaptations within two months, four months ahead of schedule, marking significant progress in the domestic AI ecosystem [1][11] - The platform supports a diverse range of models, including general large language models, specialized vertical models, and cutting-edge innovative models, providing a solid foundation for the coordinated development of domestic AI software and hardware [1][12] Development Timeline - **Launch Date**: The platform was officially launched on September 22, 2025, addressing the compatibility issues between deployed models and underlying chip architectures, which has been a barrier to the large-scale implementation of AI [2][12] - **Vertical Model Adaptation**: On October 17, 2025, the platform completed the adaptation and deep optimization of the wind tunnel calculation model on the domestic chip Xiwang S2, achieving commercial-grade performance [4] - **Frontier Model Adaptation**: On November 1, 2025, the innovative model DeepSeek-OCR was successfully adapted for testing on various domestic computing cards, showcasing significant technical innovation [6] - **Agent Model Deployment**: On November 17, 2025, the efficient agent model MiniMax-M2 was adapted for deployment on the Ascend 910B4 chip, demonstrating global leadership in model capabilities [7] - **Batch Adaptation Achievement**: On November 25, 2025, the platform achieved large-scale adaptation of 108 models on the Moer Thread chip, highlighting its strong ecological expansion capabilities [9] Platform Capabilities - The platform is driven by the EngineX engine, enabling "plug-and-play" deployment of models on domestic chips, significantly shortening deployment cycles and resolving compatibility issues [12] - The model ecosystem is rich and diverse, covering a wide range of models and supporting major domestic computing platforms [12] - The platform offers professional services for model adaptation, backed by a team of hundreds of engineers to ensure successful adaptation and stable operation in domestic environments [12] Future Outlook - The platform aims to accelerate towards a "ten thousand model" ecosystem within a year, continuing to expand model scale and chip support [14] - The company plans to maintain a rapid update pace to build a more complete and efficient domestic AI infrastructure [14]
Kimi杨植麟称“训练成本很难量化”,仍将坚持开源策略
第一财经· 2025-11-11 12:04
Core Viewpoint - Kimi, an AI startup, is focusing on open-source model development, with the recent release of Kimi K2 Thinking, which has a training cost of $4.6 million, significantly lower than competitors like DeepSeek V3 and OpenAI's GPT-3 [3][4][6] Summary by Sections Model Development and Costs - Kimi has invested heavily in open-source model research and updates over the past six months, releasing Kimi K2 Thinking on November 6, with a reported training cost of $4.6 million, lower than DeepSeek V3's $5.6 million and OpenAI GPT-3's billions [3][4] - CEO Yang Zhilin clarified that the $4.6 million figure is not official, as most expenses are on research and experimentation, making it difficult to quantify training costs [4][6] Model Performance and Challenges - Users raised concerns about the reasoning length of Kimi K2 Thinking and discrepancies between leaderboard scores and actual performance. Yang stated that the model currently prioritizes absolute performance, with plans to improve token efficiency in the future [4][7] - The gap between leaderboard performance and real-world experience is expected to diminish as the model's general capabilities improve [7] Market Position and Strategy - Chinese open-source models are increasingly being utilized in the international market, with five Chinese models appearing in the top twenty of the OpenRouter model usage rankings [7] - Kimi currently can only be accessed via API due to interface issues with the OpenRouter platform [7] - Kimi plans to maintain its open-source strategy, focusing on the application and optimization of Kimi K2 Thinking while balancing text and multimodal model development, avoiding direct competition with leading firms like OpenAI [6][8]
Kimi杨植麟称“训练成本很难量化”,仍将坚持开源策略
Di Yi Cai Jing· 2025-11-11 10:35
Core Insights - Kimi, an AI startup, has released its latest open-source model, Kimi K2 Thinking, with a reported training cost of $4.6 million, significantly lower than competitors like DeepSeek V3 at $5.6 million and OpenAI's GPT-3, which costs billions to train [1][2] - The company emphasizes ongoing model updates and improvements, focusing on absolute performance while addressing user concerns regarding inference length and performance discrepancies [1] - Kimi's strategy includes maintaining an open-source approach and advancing the Kimi K2 Thinking model while avoiding direct competition with major players like OpenAI through innovative architecture and cost control [2][4] Model Performance and Market Position - In the latest OpenRouter model usage rankings, five Chinese open-source models, including Kimi's, are among the top twenty, indicating a growing presence in the international market [2] - Kimi's current model can only be accessed via API due to platform limitations, but the team is utilizing H800 GPUs with InfiniBand technology for training, despite having fewer resources compared to U.S. high-end GPUs [2] - The company plans to balance text model development with multi-modal model advancements, aiming to establish a differentiated advantage in the AI landscape [4]
2026年投资峰会速递:AI产业新范式
HTSC· 2025-11-10 12:07
Investment Rating - The report maintains an "Overweight" rating for the technology and computer sectors [7]. Core Insights - The AI industry is entering a new paradigm characterized by the Scaling Law 2.0, where synthetic data expands the training data ceiling, and the Mid Training paradigm reshapes model evolution paths [2][3]. - The commercial application of AI is transitioning into a scaling phase, with the integration of agent capabilities and transaction loops accelerating industry implementation [2][6]. Summary by Sections Models - Computing power expansion remains the core growth engine, with representative model training computing power expected to grow at an annual rate of 4-5 times from 2010 to 2024, with leading models achieving up to 9 times [3][13]. - The cost of complete training for frontier models is projected to reach the billion-dollar level by 2027 [3][13]. Training - The Mid Training paradigm expands training boundaries by integrating reinforcement learning (RL) into the middle stage, enhancing data generation and optimal allocation [4][16]. - This approach significantly increases data utilization efficiency and is expected to break traditional performance limits [4][16]. Agents - GPT-5 establishes a "unified system" direction, promoting standardization in agent architecture through adaptive collaboration between fast and deep thinking [5][19]. - The real-time router dynamically allocates computing resources based on task complexity, enhancing response efficiency and stability in complex scenarios [5][19]. Applications - The integration of agent capabilities into commercial transactions marks a new phase of AI applications, with OpenAI's Agentic Commerce Protocol enabling AI agents to execute purchases directly [6][22]. - The global AI application landscape is evolving through three stages: productization in 2023, commercialization trials in 2024, and scaling implementation in 2025 [25][26]. - Domestic AI applications are accelerating, with significant advancements in commercial capabilities following the release of models like DeepSeek-R1 [26].