Seek .(SKLTY)
Search documents
知情人士:DeepSeek将于2月发布其最新旗舰AI模型。
Xin Lang Cai Jing· 2026-01-09 13:23
Core Insights - DeepSeek is expected to launch its next-generation flagship AI model, V4, in the coming weeks, focusing on strong code generation capabilities [2][6] - The V4 model is an iteration of the V3 model released in December 2024, and initial tests indicate it outperforms existing mainstream models like Anthropic, Claude, and OpenAI's GPT series in code generation [2][6] - The anticipated launch date for the V4 model is around mid-February, coinciding with the Lunar New Year, although this may be subject to change [2][6] Model Performance and Features - The V4 model has achieved a technological breakthrough in handling and parsing long code prompts, providing significant advantages for engineers working on complex software projects [4][7] - Improvements in understanding data patterns throughout the training process have been made, with no performance degradation observed [4][7] - Users can expect more logically coherent and clear outputs from the V4 model, reflecting enhanced reasoning capabilities and increased reliability in executing complex tasks [4][7] Previous Models and Market Impact - The V3.2 version released in December 2024 outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in certain benchmark tests, but no major model iterations have been released since, heightening anticipation for the V4 model [3][7] - DeepSeek's R1 model, an open-source reasoning model, gained significant attention for its cost-effective training relative to leading models developed in the U.S., while still delivering impressive performance [2][6] Research and Development Innovations - A new training architecture proposed in a recent research paper co-authored by DeepSeek's CEO allows for the development of larger AI models without proportionally increasing chip investments [8][9] - This series of technological advancements indicates that DeepSeek continues to make strides in innovation within the AI sector [8][9]
据报道,DeepSeek将于2月发布新一代旗舰AI模型,具备强大的编程能力
Hua Er Jie Jian Wen· 2026-01-09 13:19
市场有风险,投资需谨慎。本文不构成个人投资建议,也未考虑到个别用户特殊的投资目标、财务状况或需要。用户应考虑本文中的任何 意见、观点或结论是否符合其特定状况。据此投资,责任自负。 据报道,DeepSeek将于2月发布新一代旗舰AI模型,具备强大的编程能力。 风险提示及免责条款 ...
毫无征兆,DeepSeek R1爆更86页论文,这才是真正的Open
3 6 Ke· 2026-01-09 03:12
R1论文暴涨至86页!DeepSeek向世界证明:开源不仅能追平闭源,还能教闭源做事! 全网震撼! 两天前,DeepSeek悄无声息地把R1的论文更新了,从原来22页「膨胀」到86页。 全新的论文证明,只需要强化学习就能提升AI推理能力! DeepSeek似乎在憋大招,甚至有网友推测纯强化学习方法,或许出现在R2中。 这一次的更新,直接将原始论文升级为:一份开源社区完全可复现的技术报告。 论文地址:https://arxiv.org/abs/2501.12948 论文中,DeepSeek-R1新增内容干货满满,信息含量爆炸—— | Benchmark (Metric) | | | | Claude-3.5- GPT-40 DeepSeek OpenAI OpenAI DeepSeek | | | | | --- | --- | --- | --- | --- | --- | --- | --- | | | | Sonnet-1022 | 0513 | V3 | o1-mini o1-1217 | | R1 | | Architecture | | - | - | MoE | - | - | MoE | | # ...
清库存,DeepSeek突然补全R1技术报告,训练路径首次详细公开
3 6 Ke· 2026-01-09 03:12
Core Insights - DeepSeek has released an updated version of its research paper on the R1 model, adding 64 pages of technical details, significantly enhancing the original content [4][25] - The new version emphasizes the implementation details of the R1 model, showcasing a systematic approach to its training process [4][6] Summary by Sections Paper Update - The updated paper has expanded from 22 pages to 86 pages, providing a comprehensive view of the R1 model's training and operational details [4][25] - The new version includes a detailed breakdown of the training process, which is divided into four main steps: cold start, inference-oriented reinforcement learning (RL), rejection sampling and fine-tuning, and alignment-oriented RL [6][9] Training Process - The cold start phase utilizes thousands of CoT (Chain of Thought) data to perform supervised fine-tuning (SFT) [6] - The inference-oriented RL phase enhances model capabilities while introducing language consistency rewards to address mixed-language issues [6] - The rejection sampling and fine-tuning phase incorporates both reasoning and general data to improve the model's writing and reasoning abilities [6] - The alignment-oriented RL phase focuses on refining the model's usefulness and safety to align more closely with human preferences [6] Safety Measures - DeepSeek has implemented a risk control system to enhance the safety of the R1 model, which includes a dataset of 106,000 prompts to evaluate model responses based on predefined safety criteria [9][10] - The safety reward model employs a point-wise training method to distinguish between safe and unsafe responses, with training hyperparameters aligned with the usefulness reward model [9] - The risk control system operates through two main processes: potential risk dialogue filtering and model-based risk review [9][10] Performance Metrics - The introduction of the risk control system has led to a significant improvement in the model's safety performance, with R1 achieving benchmark scores comparable to leading models [14] - DeepSeek has developed an internal safety evaluation dataset categorized into four main categories and 28 subcategories, totaling 1,120 questions [19] Team Stability - The core contributors to the DeepSeek team have largely remained intact, with only five out of over 100 authors having left, indicating strong team retention in a competitive AI industry [21][24] - Notably, a previously departed author has returned to the team, highlighting a positive team dynamic compared to other companies in the sector [24]
DeepSeek与意大利谈妥了,但...
Guan Cha Zhe Wang· 2026-01-08 06:57
Core Insights - DeepSeek, a Chinese AI startup, has reached an agreement with Italy's antitrust authority (AGCM) to launch a country-specific version of its chatbot for Italian users and address the "hallucination" issues in its AI model [1][2] - The AGCM concluded its investigation after DeepSeek committed to improving transparency regarding hallucination risks and implementing technical fixes [2][5] - DeepSeek's measures include providing hallucination risk warnings in Italian and organizing workshops for employees to better understand local consumer laws [2][5] Company Developments - DeepSeek has submitted multiple remediation plans to AGCM, gradually meeting regulatory requirements, which led to the termination of the investigation [1][2] - The company reported over 80 million weekly active users, ranking second among domestic AI applications, and achieved a cumulative token usage of 14.37 trillion, leading the global open-source model rankings [6] Industry Context - The "hallucination" issue is a common challenge across the generative AI industry, with AGCM acknowledging that it is a global problem that cannot be completely eliminated [5] - Despite the challenges, DeepSeek's proactive approach may facilitate its expansion into the European market [5] - The potential classification of DeepSeek under the EU's Digital Services Act (DSA) remains uncertain, which could subject the company to stricter scrutiny [6]
光模块CPO龙头反弹,创业板人工智能再创新高!DeepSeek旗舰系统R2春节问世,AI应用大年启动?
Xin Lang Cai Jing· 2026-01-07 11:42
Group 1 - The core viewpoint of the news is that the AI sector, particularly the entrepreneurial board AI index, is experiencing significant growth, driven by advancements in computing hardware and AI applications [1][5][7] - The entrepreneurial board AI index reached a new high, with a cumulative increase of over 114% from January 1, 2025, to January 7, 2026, outperforming other AI-themed indices [3][7] - Key stocks in the AI sector, such as Zhishang Technology and Changxin Bochuang, saw substantial gains, with Zhishang Technology leading with an increase of over 7% [1][5] Group 2 - The upcoming launch of DeepSeek's next-generation flagship system R2 is expected to catalyze further growth in AI applications [7] - Meta's acquisition of Manus for billions is seen as a strategic move to enhance its AI capabilities and accelerate the commercialization of AI technologies [3][7] - The demand for computing power is projected to remain strong, with both domestic and international markets investing heavily in computing infrastructure, benefiting companies involved in optical interconnection solutions [3][7] Group 3 - The entrepreneurial board AI ETF (159363) has shown strong liquidity, with a daily trading volume exceeding 600 million yuan and a recent price increase of 0.79% [1][5] - The ETF is designed to track the entrepreneurial board AI index, which has shown varying annual performance from 2018 to 2025, including a notable increase of 106.35% in 2025 [4][8] - The ETF's portfolio is heavily weighted towards computing hardware, with over 70% allocated to this sector and more than 20% to AI applications, positioning it well to capture AI market trends [8]
新年首炸!DeepSeek提出mHC架构破解大模型训练难题
Sou Hu Cai Jing· 2026-01-07 09:13
Core Insights - DeepSeek has introduced a new architecture called mHC aimed at addressing stability issues in large-scale model training while maintaining performance improvements [1][11]. Group 1: Problem Identification - Large models face a dilemma in training stability, where traditional single-channel connections lead to information congestion as model size increases [3][5]. - Previous solutions, like the hyper-connection approach, improved efficiency but introduced new issues such as uncontrolled information amplification or suppression, leading to gradient explosion and training failures [5][7][9]. Group 2: mHC Architecture - The mHC architecture incorporates an intelligent scheduling system for multi-channel connections, utilizing the Sinkhorn-Knopp algorithm to maintain energy conservation during information transmission [11][13]. - Additional design features include non-negative constraints on input-output mappings to prevent useful signal loss due to coefficient cancellation [15]. Group 3: Infrastructure Optimization - DeepSeek has optimized its infrastructure by merging multiple computation steps into a single operator, reducing memory read/write cycles and employing recomputation strategies to lower memory usage [16][18]. - These optimizations have resulted in significant stability improvements with minimal increases in training time, even at an expansion factor of 4 [18]. Group 4: Performance Validation - Testing on various model sizes, particularly a 27 billion parameter model, demonstrated that mHC effectively resolved training instability issues, achieving lower loss values compared to traditional baseline models [21][22]. - The performance advantages of mHC were consistent across different model sizes, indicating its practical value for both small and large models [24]. Group 5: Industry Implications - The introduction of mHC suggests a shift in the industry towards refined architectural designs rather than merely increasing parameters and computational power, potentially lowering entry barriers for smaller companies in the large-scale model domain [26][29]. - This pragmatic technological innovation is expected to facilitate the deployment of AI technologies, making it easier for more enterprises to engage in large-scale model development [29].
老黄开年演讲「含华量」爆表,直接拿DeepSeek、Kimi验货下一代芯片
3 6 Ke· 2026-01-07 01:35
Core Insights - The presentation at CES 2026 highlighted the significant advancements of Chinese AI models, particularly Kimi K2 and DeepSeek, which are now competing closely with closed-source models in performance [1][8] - The introduction of the MoE (Mixture of Experts) architecture has become a mainstream choice, with over 60% of open-source AI models adopting this structure since 2025, leading to a substantial increase in intelligence levels [16][31] Group 1: Model Performance and Advancements - Kimi K2 Thinking's inference throughput increased tenfold, with token costs dropping to one-tenth of previous levels, indicating a shift towards a "price parity era" for AI inference [4][6] - DeepSeek-R1 and Kimi K2 represent top-tier attempts under the MoE architecture, significantly reducing computational load and memory bandwidth requirements [2][12] - The performance of Kimi K2 Thinking was validated in tests, showing a tenfold increase in performance on the GB200 NVL72 platform [9][19] Group 2: Global Recognition and Impact - DeepSeek and Kimi K2 were recognized in a rigorous benchmark test, with Kimi K2 Thinking achieving the title of "best-performing non-U.S. model" due to its low misguidance rate [21][24] - The rapid development of Chinese open-source models is closing the gap with the strongest closed-source models, providing a significant first-mover advantage [31] - The increasing international acceptance of Chinese AI models is evidenced by endorsements from prominent figures in the tech industry, indicating a growing influence in the global market [24][33] Group 3: Trends and Future Directions - The transition from high benchmark scores to practical usability is evident, with models like Qwen evolving from being known for high scores to being recognized for their quality [32] - The emergence of features such as "interleaved thinking" in Kimi K2 Thinking reflects a trend towards more sophisticated model capabilities, enhancing their applicability in real-world scenarios [34] - The rise of open-source models is pressuring U.S. closed-source giants, as the value proposition of paid models becomes harder to justify against the performance of open-source alternatives [35]
雷军回应小字营销:行业陋习,但我们改/DeepSeek开年「王炸」,梁文锋署名论文发布/马斯克立新年Flag:大规模量产脑机接口
Sou Hu Cai Jing· 2026-01-06 13:46
雷军回应小字营销:是陋习,以后用大字 ✒️ OpenAI 神秘 AI 硬件新爆料:或不止一款 多家车企公布 2025 年销量情况 马斯克:2026 年大规模量产脑机接口 宝马中国回应最高降价 30 万元 消息称苹果 A20 成本比 A19 贵 80% 梁文锋参与,DeepSeek 发布新论文 索尼 2026 新品线曝光:包含 FX3 II 黄仁勋对谈联想董事长,将联合发布「革命性服务器」 微软 CEO:2026 年是 AI 关键年 ⌚️ Pebble 发布 Round 2 圆形智能手表 元旦你出去玩了吗?假期首日跨区域人员流动破 2 亿人次 《疯狂动物城 2》获元旦票房冠军 雷军回应小字营销:是陋习,以后用大字 昨晚,小米创始人雷军进行 2026 年首场直播,邀请工程师现场拆解小米 YU7,同时还在直播中回应了部分网络热门话题。 针对此前较为热门的「小字营销」争议,雷军回应表示,小字做标注/标释确实是行业常见惯例,更多的考虑是为了法律合规。其强调,这也是行业的陋 习,需要立刻改。 其也承认,为了符合广告法,确实当中有一部分是忽略了大家的感受,看上去有点像故意吹牛也是事实。 雷军还提出,此前小米 17 Pro ...
意大利结束对DeepSeek调查,涉及幻觉风险信息披露
2 1 Shi Ji Jing Ji Bao Dao· 2026-01-06 12:15
持有和运营DeepSeek的两家公司——杭州深度求索人工智能基础技术研究有限公司、北京深度求索人工智能基础技术研究有限公司——做出 的承诺包含一系列改进关于幻觉风险信息披露的措施。 "幻觉"情况是指AI模型根据用户给定的输入,生成包含不准确、误导性或捏造信息的输出。 意大利在DeepSeek承诺就人工智能"幻觉"风险向用户发出警告后,结束了对其调查。 当地时间1月5日,意大利反垄断机构AGCM发布每周例行公报,表示已结束针对DeepSeek的调查,并同意以具有约束力的承诺作为结案条 件。2025年6月,AGCM因DeepSeek涉嫌未警告用户其可能会产生虚假讯息,而启动了该项调查。 21世纪经济报道记者陈归辞 AGCM在公告中称,DeepSeek做出的承诺,使有关幻觉风险的信息披露更容易、透明、易懂和及时。 自2025年年初面世以来,DeepSeek凭借其模型能力、极高的性价比和开源策略,迅速在全球范围内爆火。据Quest Mobile数据,2025年第三季 度,DeepSeek中国月活为1.45亿,位列国内AI应用第二。据OpenRouter与a16z联合发布的《人工智能发展现状:基于OpenRouter平 ...