DeepSeek

Search documents
DeepSeek“极你太美”bug,官方回应了
量子位· 2025-08-27 02:24
Core Viewpoint - The article discusses a significant bug in the DeepSeek V3.1 model, which has caused widespread concern among developers due to the unexpected appearance of the character "极" in generated code outputs, leading to potential compilation failures and issues in high-precision tasks [1][2][11]. Summary by Sections Bug Discovery and Impact - Developers have reported that during API calls for code development, the output occasionally includes the character "极", which can disrupt the coding process [2][5]. - The issue was first identified on platforms like Volcano Engine and Chutes, but it has since affected other platforms, including Tencent's CodeBuddy and DeepSeek's official channels [5]. Community Response and Solutions - The community has pointed fingers at the DeepSeek V3.1 model for the bug, and CodeBuddy has reached out to DeepSeek for a fix in an upcoming version [12]. - Users have begun sharing tips to mitigate the "极" bug, such as using specific prompt patterns to avoid triggering the issue [14][18]. Analysis of the Bug's Origin - A user on Zhihu, Huang Zhewai, suggested that this bug is not an isolated incident and may relate to a "malicious pattern" in large model programming [21]. - Huang observed that similar issues occurred in earlier models, where the output would unexpectedly include terms like "极长" after a series of repetitions, indicating a potential flaw in the model's reasoning process [21][22]. - He hypothesized that the root cause might be inadequate data cleaning during the supervised fine-tuning (SFT) phase, leading to the model learning to use "极" as a termination marker [22]. Future Outlook - The resolution of the "极" bug is contingent upon the release of a new version from DeepSeek, which is expected to address the underlying issues [24].
“人工智能+”行动意见印发,创业板人工智能ETF南方(159382)涨近3%,最新规模创成立以来新高
Xin Lang Cai Jing· 2025-08-27 02:19
Group 1: Market Performance - The ChiNext AI ETF (159382) increased by 2.71% as of August 27, 2025, with a turnover rate of 9.36% and a transaction volume of 24.07 million yuan [1] - Over the past week, the ChiNext AI ETF has accumulated an increase of 8.15% [1] - The latest scale of the ChiNext AI ETF reached 251 million yuan, marking a new high since its establishment [1] Group 2: Fund Flows - The ChiNext AI ETF experienced a net inflow of 27.42 million yuan recently, with a total net inflow of 48.07 million yuan over the last five trading days [1] Group 3: Government Policy - The State Council issued an opinion on implementing the "AI+" initiative, focusing on six key actions to enhance the integration of AI with various sectors by 2027 [2] - The initiative aims for over 70% application penetration of new intelligent terminals and significant growth in the core industries of the intelligent economy [2] Group 4: Industry Trends - Tianfeng Securities noted a positive trend in China's AI sector, highlighting advancements in domestic model capabilities and a significant acceleration in AI application commercialization [3] - The report emphasizes that the synergy of "model + chip + application" is forming a collaborative optimization paradigm in the industry [3] - The top ten weighted stocks in the ChiNext AI Index include companies like Zhongji Xuchuang, Xinyi Sheng, and Tianfu Communication, reflecting the performance of AI-related stocks [3]
国家出手,全民重构:人工智能+,真的来了
3 6 Ke· 2025-08-27 01:00
Core Viewpoint - The article emphasizes the launch of China's "Artificial Intelligence+" initiative, which is expected to fundamentally transform various sectors and society as a whole, similar to the impact of "Internet+" in the past [2][3][7]. Group 1: Overview of "Artificial Intelligence+" - "Artificial Intelligence+" is not merely a continuation of "Internet+" but represents a new paradigm shift that focuses on "empowerment" rather than just "connection" [10][25]. - The initiative aims to integrate AI deeply into processes, products, and services, fundamentally altering how industries operate [10][14][24]. - The document outlines a clear three-phase roadmap for AI integration into society and the economy, with specific timelines for achieving widespread adoption [29][48]. Group 2: Phased Roadmap - The first phase targets a 70% adoption rate of AI applications by 2027, making AI tools commonplace in daily life [30][31]. - The second phase aims for over 90% adoption by 2030, positioning AI as a critical infrastructure akin to water and electricity [34][37]. - By 2035, the goal is to fully transition into an "intelligent economy and society," where AI will be deeply embedded in all aspects of life [40][41]. Group 3: Key Areas of Focus - The initiative identifies six key areas for AI application, including scientific research, industrial development, consumer enhancement, public welfare, governance, and global cooperation [53][80]. - In scientific research, AI is expected to accelerate breakthroughs and enhance productivity [54][55]. - In industrial development, AI will lead to a complete overhaul of traditional business models and operational efficiencies [57][58]. Group 4: Societal Impact - The initiative aims to ensure that AI services are accessible to all, promoting equity and improving quality of life [64][67]. - Education will see personalized AI tutors for every student, enhancing learning experiences [65][66]. - Healthcare will benefit from AI-driven health management systems, providing continuous monitoring and support [67][68]. Group 5: Global Strategy - The strategy emphasizes the importance of global collaboration in AI development, advocating for open-source models and participation in international governance [74][75]. - This approach aims to position China as a leader in the global AI ecosystem, enhancing its influence and competitiveness [76][79].
消费电子深度报告:附产业链龙头名单
Sou Hu Cai Jing· 2025-08-26 17:54
Group 1 - The global consumer electronics industry is entering a new innovation cycle in Q3 2025, driven by AI applications and advancements in self-developed chips by major tech companies like Google, Meta, and Apple [1][3][4] - Google's Pixel 10 series features the new Tensor G5 chip, which enhances AI capabilities with a 60% increase in TPU performance and a 34% boost in CPU speed, enabling advanced features like real-time voice translation and AI-driven photography [1][9] - Meta is restructuring its AI department into four groups focused on large model development, AI product applications, infrastructure, and foundational research, while also launching new AI-powered wearable devices [2][10][12] Group 2 - Apple is initiating a three-year innovation plan starting with the iPhone 17 series, aiming to introduce a new product each year and enhance its AI capabilities by integrating Google's Gemini AI into Siri [3][15][18] - Apple's Q3 FY25 revenue reached $94 billion, a 10% year-over-year increase, with significant growth in iPhone, Mac, and services, particularly in the Chinese market where revenue grew by 4% [4][24][23] - The panel industry is stabilizing, with prices holding steady in August, and leading manufacturers maintaining market share through cost control and technological upgrades [5][28][29] Group 3 - The AI cloud sector is advancing with DeepSeek's launch of a hybrid inference model, which significantly enhances multi-tasking and tool usage capabilities [4][26] - The adoption of liquid cooling technology in AI data centers is expected to rise to 33% by 2025, driven by the need for efficient thermal management in high-density AI chip deployments [4][27] - The consumer electronics index in the A-share market rose by 8.26% in the week of August 15-22, outperforming major indices, indicating strong market performance [4][32][41]
腾讯研究院AI速递 20250827
腾讯研究院· 2025-08-26 16:01
一、 英伟达最新推出Jet-Nemotron小模型系列(2B/4B) 1. Jet-Nemotron是英伟达最新推出的小模型系列,由全华人团队打造,提出后神经架构搜索(PostNAS)与新型线性 注意力模块JetBlock; 2. 模型在数学、代码、常识、检索和长上下文等维度表现突出,性能超越Qwen3、Gemma3、Llama3.2等主流开源 全注意力语言模型; 3. 在H100 GPU上推理吞吐量最高提升53.6倍,长上下文场景下的优势特别明显,是英伟达在小模型领域的重要布 局。 https://mp.weixin.qq.com/s/8ZbWGnogg40sHknVBWHH1Q 二、 面壁多模态新旗舰MiniCPM-V 4.5:8B 性能超越 72B 生成式AI 1. 面壁小钢炮MiniCPM-V 4.5成为首个具备"高刷"视频理解能力的多模态模型,8B参数量却超越Qwen2.5-VL 72B 模型; 2. 该模型在MotionBench、FavorBench榜单达到同尺寸SOTA,最大可接收6倍视频帧数量,达到96倍视觉压缩 率; 3. 采用3D-Resampler高密度视频压缩、统一OCR和知识推理学 ...
寒武纪半年报“交卷”,同比增4300%
Zheng Quan Shi Bao· 2025-08-26 14:10
8月21日,DeepSeek在其官宣发布DeepSeek-V3.1的文章中提到,DeepSeek-V3.1使用了UE8M0 FP8 Scale的参数精度;另外,V3.1对分词器及Chat Template 进行了较大调整,与DeepSeek-V3存在明显差异;DeepSeek官微在置顶留言里表示,UE8M0 FP8是针对即将发布的下一代国产芯片设计。南方基金认为, 此举印证国产芯片设计在自主可控和国产替代的征途上再次迈出强有力一步。 此外,上周OpenAI CEO表示希望在未来投入数万亿美元用于开发和运行AI服务所需的基础设施建设。南方基金表示,此次OpenAI扩容再度印证大模型商 用进程加速将直接驱动超大规模训练集群建设需求,而未来国内厂商势必将占据一席之地。 今年以来,以DeepSeek为代表的原生创新企业的成功全面验证了中国科技实力的崛起以及中国教育体系的成功。诺安基金表示,预计未来像这样有潜 力"改变世界"的硬核创新,将会如雨后春笋般在中国出现,"这将改变全球市场对于中国科技资产的预期,中国科技资产的估值有望迎来系统性的估值重 构"。 浙商证券指出,出口管制倒逼本土创新崛起,中长期看,自主可控仍是主线 ...
寒武纪半年报“交卷”!同比增4300%
Zheng Quan Shi Bao Wang· 2025-08-26 13:57
Group 1 - Cambricon reported a revenue of 2.881 billion yuan for the first half of the year, representing a year-on-year increase of 4347.82% [1] - The net profit attributable to the parent company was 1.038 billion yuan, a turnaround from a net loss of 530 million yuan in the same period last year [1] - As of the latest closing, Cambricon's stock price was 1329 yuan per share, with a total market capitalization of 556 billion yuan [1] Group 2 - Cambricon, established in 2016, focuses on the research and development of artificial intelligence chip products and aims to create core processor chips in the AI field [3] - The recent announcement of DeepSeek-V3.1 indicates significant advancements in domestic chip design, reinforcing the trend of self-reliance and domestic substitution in the industry [3] - OpenAI's CEO expressed intentions to invest trillions of dollars in AI infrastructure, highlighting the accelerating commercialization of large models and the growing demand for large-scale training clusters [3] Group 3 - The success of innovative companies like DeepSeek this year validates the rise of China's technological strength and the effectiveness of its education system [4] - The export controls are driving local innovation, with a long-term focus on self-reliance, benefiting domestic manufacturers [4] - The semiconductor cycle is currently on an upward trend, with AI being the primary growth driver, and domestic semiconductor companies are expected to benefit significantly from the ongoing development of the AI industry [4]
寒武纪半年报“交卷”!同比增4300%
证券时报· 2025-08-26 13:46
资料显示,寒武纪成立于2016年,专注于人工智能芯片产品的研发与技术创新,致力于打造人工智能领域的核心处理器芯片。 8月21日,DeepSeek在其官宣发布DeepSeek-V3.1的文章中提到,DeepSeek-V3.1使用了UE8M0 FP8 Scale的参数精度;另外,V3.1对分词器及Chat Template进行了较大调整,与DeepSeek-V3存在明显差异;DeepSeek官微在置顶留言里表示,UE8M0 FP8是针对即将发布的下一代国产芯片设计。南方 基金认为,此举印证国产芯片设计在自主可控和国产替代的征途上再次迈出强有力一步。 此外,上周OpenAI CEO表示希望在未来投入数万亿美元用于开发和运行AI服务所需的基础设施建设。南方基金表示,此次OpenAI扩容再度印证大模型商用 进程加速将直接驱动超大规模训练集群建设需求,而未来国内厂商势必将占据一席之地。 今年以来,以DeepSeek为代表的原生创新企业的成功全面验证了中国科技实力的崛起以及中国教育体系的成功。诺安基金表示,预计未来像这样有潜力"改 变世界"的硬核创新,将会如雨后春笋般在中国出现,"这将改变全球市场对于中国科技资产的预期,中 ...
AI动态汇总:DeepSeek线上模型升级至V3.1,字节开源360亿参数Seed-OSS系列模型
China Post Securities· 2025-08-26 13:00
- DeepSeek-V3.1 model is an upgraded version of the DeepSeek language model, featuring a hybrid inference architecture that supports both "thinking mode" and "non-thinking mode" for different task complexities[12][13][14] - The model's construction involves dynamic activation of different attention heads and the use of chain-of-thought compression training to reduce redundant token output during inference[13] - The context window length has been expanded from 64K to 128K, allowing the model to handle longer documents and complex dialogues[15] - The model's performance in various benchmarks shows significant improvements, such as a 71.2 score in xbench-DeepSearch and 93.4 in SimpleQA[17] - The model's evaluation highlights its advancements in hybrid inference, long-context processing, and tool usage, although it still faces challenges in complex reasoning tasks[21] - Seed-OSS model by ByteDance features 36 billion parameters and a native 512K long-context window, emphasizing research friendliness and commercial practicality[22][23] - The model uses a dense architecture with 64 layers and integrates grouped-query attention (GQA) and rotary position encoding (RoPE) to balance computational efficiency and inference accuracy[23] - The "thinking budget" mechanism allows dynamic control of inference depth, achieving high scores in various benchmarks like 91.7% accuracy in AIME24 math competition[24] - The model's evaluation notes its strong performance in long-context and reasoning tasks, though its large parameter size poses challenges for edge device deployment[25] - WebWatcher by Alibaba is a multimodal research agent capable of synchronously parsing image and text information and autonomously using various toolchains for multi-step tasks[26][27] - The model's construction involves a four-stage training framework, including data synthesis and reinforcement learning to optimize long-term reasoning capabilities[27] - WebWatcher excels in benchmarks like BrowseComp-VL and MMSearch, achieving scores of 13.6% and 55.3% respectively, surpassing top closed-source models like GPT-4o[28] - The model's evaluation highlights its breakthrough in multimodal AI research, enabling complex task handling and pushing the boundaries of open-source AI capabilities[29] - AutoGLM 2.0 by Zhipu AI is the first mobile general-purpose agent, utilizing a cloud-based architecture to decouple task execution from local device capabilities[32][33] - The model employs GLM-4.5 and GLM-4.5V for task planning and visual execution, using an asynchronous reinforcement learning framework for end-to-end task completion[34] - AutoGLM 2.0 demonstrates high efficiency in various tasks, such as achieving a 75.8% success rate in AndroidWorld and 87.7% in WebVoyager[35] - The model's evaluation notes its significant advancements in mobile agent technology, though it still requires optimization for cross-application stability and scenario generalization[37] - WeChat-YATT by Tencent is a large model training library designed to address scalability and efficiency bottlenecks in multimodal and reinforcement learning tasks[39][40] - The library introduces parallel controller mechanisms and partial colocation strategies to enhance system scalability and resource utilization[40][42] - WeChat-YATT shows a 60% reduction in overall training time compared to the VeRL framework, with each training stage being over 50% faster[45] - The model's evaluation highlights its effectiveness in large-scale RLHF tasks and its potential to drive innovation in multimodal and reinforcement learning fields[46] - Qwen-Image-Edit by Alibaba's Tongyi Qianwen team is an image editing model that integrates dual encoding mechanisms and multimodal diffusion Transformer architecture for semantic and appearance editing[47][48] - The model's construction involves dual-path input design and chain editing mechanisms to maintain high visual fidelity and iterative interaction capabilities[48][49] - Qwen-Image-Edit achieves SOTA scores in multiple benchmarks, with comprehensive scores of 7.56 and 7.52 in English and Chinese scenarios respectively[50] - The model's evaluation notes its transformative impact on design workflows, enabling automated handling of rule-based editing tasks and lowering the barrier for visual creation[52] Model Backtest Results - DeepSeek-V3.1: Browsecomp 30.0, Browsecomp_zh 49.2, HLE 29.8, xbench-DeepSearch 71.2, Frames 83.7, SimpleQA 93.4, Seal0 42.6[17] - Seed-OSS: AIME24 math competition 91.7%, LiveCodeBench v6 67.4, RULER (128K) 94.6, MATH task 81.7[24] - WebWatcher: BrowseComp-VL 13.6%, MMSearch 55.3%, Humanity's Last Exam-VL 13.6%[28] - AutoGLM 2.0: AndroidWorld 75.8%, WebVoyager 87.7%[35] - Qwen-Image-Edit: English scenario 7.56, Chinese scenario 7.52[50]
打破封锁!中国芯片强势突围 引发美股动荡,英伟达一夜蒸发上万亿
Sou Hu Cai Jing· 2025-08-26 12:12
文 | JXY 编辑|青橘罐头 前言 近一段时间,美国股市可谓是经历了罕见的震荡,尤其是芯片巨头英伟达,在上周股价经历了暴跌,市 值一度蒸发了将近1.1万亿人民币。 而在这背后,是中国芯片行业的崛起,先是有华为新一代AI芯片的发布,逐步打破西方对高端芯片的 垄断。 之后国内新兴AI企业DeepSeek又推出了新一代大模型DeepSeek-3.1,该模型不仅展现出卓越性能,还首次 宣布专门适配下一代国产芯片,标志着中国AI产业生态建设取得关键突破。 与此同时,一场即将席卷全球的芯片行业大洗牌也正在上演。 美股震荡,中国芯片行业步入新台阶 8月19日,芯片巨头英伟达股价暴跌3.5%,创下自4月21日以来最大跌幅,单日市值蒸发约1500亿美 元。 不仅仅是英伟达,整个芯片板块遭受重创,英特尔下跌超7%,其他多个芯片企业也有不同程度的下 跌。 与美国芯片股大跌形成鲜明对比的是,8月21日,中国AI企业DeepSeek正式发布了新一代大语言模型 DeepSeek-V3.1。 这款新模型通过采用混合专家的架构,不仅实现了效率与性能的平衡,更是能够同时支持两种思考模 式,使用户能够根据不同的场景灵活切换。 除此之外,这种格 ...