上下文窗口 - filings, earnings calls, financial reports, news

上下文窗口

Search documents

36氪· 2026-02-13 10:20

Core Viewpoint - DeepSeek has significantly upgraded its flagship model, increasing the context window from 128K Tokens to 1M Tokens, allowing for a substantial enhancement in processing capacity and information retention during interactions [5][6]. Group 1: Model Upgrade Details - The new 1M Tokens context window enables DeepSeek to process approximately 750,000 to 900,000 English letters or around 80,000 to 150,000 lines of code in a single interaction [6]. - This upgrade allows DeepSeek to read and understand the entire "Three-Body Problem" trilogy (approximately 900,000 words) and perform macro analysis or detailed retrieval within minutes [6]. - The knowledge base of DeepSeek is set to be updated from mid-2024 to May 2025, although the current version does not support visual understanding or multimodal input, focusing solely on text and voice interactions [6]. Group 2: User Feedback and Reactions - Users have reported changes in the model's writing style post-update, describing it as more formal and less personal, leading to dissatisfaction among some who feel it has lost its previous empathetic touch [7][8]. - There is a growing call among users for DeepSeek to maintain its depth of thought and emotional understanding while enhancing technical capabilities, with some reverting to older versions of the application [8]. - The current gray version is not officially labeled as "DeepSeek-V4" and is perceived as a test version that prioritizes speed over quality, preparing for the anticipated V4 release in February 2026 [9].

Jing Ji Guan Cha Wang· 2026-02-12 04:57

Core Insights - DeepSeek has conducted a gray test of its flagship model, significantly increasing its context window from 128K Tokens to 1M Tokens, achieving nearly an 8-fold capacity increase [1] - The upgraded model can process approximately 750,000 to 900,000 English letters or around 80,000 to 150,000 lines of code in a single interaction [1] - DeepSeek claims it can read and understand the entire "Three-Body" trilogy (approximately 900,000 words) and perform macro analysis or detail retrieval within minutes [1] Model Features - The gray version does not yet support visual understanding or multimodal input, focusing solely on text and voice interactions [2] - DeepSeek allows file uploads in formats like PDF and TXT, but currently processes them by converting to text tokens rather than native multimodal understanding [2] - Compared to models like Gemini 3 Pro, which can handle over 2M long texts and complex media tasks, DeepSeek offers 1M text context processing at about one-tenth the price [2] User Experience - Users have noted changes in the model's writing style post-update, describing it as more formal and less personal, leading to dissatisfaction among some users [2][3] - Feedback from users indicates a desire for DeepSeek to maintain its depth of thought and emotional understanding, rather than sacrificing these for enhanced technical capabilities [3] - Users have reported difficulties in reverting to previous writing styles and have expressed feelings of losing a "close friend" due to the changes [3] Company Response - As of February 12, DeepSeek has not responded to inquiries regarding the gray test [4]

Qwen新模型直逼Claude4！可拓展百万上下文窗口，33GB本地即可运行

量子位· 2025-08-01 00:46

Core Viewpoint - Qwen3-Coder is positioned as a groundbreaking open-source programming model that challenges existing models with its high performance and local usability [1][2][3]. Group 1: Model Performance - Qwen3-Coder-Flash has been released as a lightweight version with significant performance capabilities, comparable to GPT-4.1 [2][3]. - It surpasses top open-source models in multi-programming tasks, only slightly lagging behind proprietary models like Claude Sonnet-4 and GPT-4.1 [5]. - The model supports a native context window of 256k tokens, which can be extended to 1 million tokens, making it suitable for large codebases and complex multi-file projects [16]. Group 2: Technical Specifications - Qwen3-Coder utilizes a mixture of experts (MoE) architecture with a total of 3 billion parameters, of which 330 million are activated [16]. - It is optimized for various platforms including Qwen Code, Cline, Roo Code, and Kilo Code, and supports seamless function calls and agent workflows [16]. Group 3: User Experience and Applications - Users have reported successful implementations of game coding tasks, demonstrating the model's ability to generate effective code with minimal prompts [12][14]. - The model has been tested on devices with limited memory, such as the M2 Macbook Pro, showcasing its versatility and efficiency [12][18]. - Qwen3-Coder-Flash is highlighted as an excellent choice for local programming, emphasizing its user-friendly nature [10]. Group 4: Community and Ecosystem - The rapid pace of updates and open-source releases from Qwen has created a competitive environment in the domestic model landscape [18]. - Various platforms and community resources are available for users to experience Qwen3-Coder, including QwenChat and ModelScope [19].