DeepSeek
Search documents
融资 1200亿后 Kimi 再扔王牌,新架构爆改 Transformer 老配件,比 DeepSeek 同款还省钱
AI前线· 2026-03-17 07:53
作者 | 允毅 连马斯克、Andrej Karpathy 都纷纷点赞,DeepSeek 和 Kimi 前后脚都盯上的 "残差连接" ,到底是 什么? 最近,Kimi 放出一篇重磅新论文,瞄准一个过去十年几乎没人动过的 Transformer 底层根基: 残差 连接(Residual Connection) 。残差连接由何恺明于 2015 年在 ResNet 论文中提出,此后便成为 深度学习领域的标配。 简单来说,可以把大模型的 Transformer 架构,想象成一支几十人排成长队的"传话小组",那么残差 连接就像一条规定:每个工人听完前面所有人的话后,都往里面再补一句,然后原封不动往后传。 这套规则长这样: 但这会带来一个麻烦:队尾的工人收到的话,是前面几十个工人的内容全堆在一起的,越往后话越 乱、越长,前面工人说的重点被埋住了,后面工人加的内容也没人听得清,AI 就变笨了。这叫"稀释 问题"。 于是,Kimi 想到把 "注意力机制" 引进来解决这一问题,它提出一个新的规则: "注意力残 差"(Attention Residuals) 。如同给工人们配备了"智能筛选器",不用再全盘收下前面堆出来的大 杂烩, ...
梁文锋推迟V4,是为了根治龙虾的健忘症?
虎嗅APP· 2026-03-17 00:08
Core Viewpoint - The article discusses the anticipation surrounding the release of DeepSeek's V4, emphasizing the importance of its Long-Term Memory (LTM) feature, which aims to enhance AI's contextual understanding and memory capabilities, setting it apart from competitors like OpenClaw [7][8][17]. Group 1: V4 Development and Features - DeepSeek's V4 is expected to include a significant architectural overhaul with 1 trillion parameters and native multimodal capabilities, set to be released in April [7][8]. - The core innovation of V4 is the Long-Term Memory (LTM) system, which allows the AI to retain user interactions and preferences over time, improving its contextual understanding [8][11]. - The LTM aims to address the limitations of existing models, particularly OpenClaw, which struggles with memory retention and context management [9][10][22]. Group 2: Challenges and Competitor Analysis - The AI industry is rapidly evolving, with competitors releasing new features and models, putting pressure on DeepSeek to catch up [38]. - DeepSeek currently lacks multimodal capabilities, being primarily a text-based model, while competitors have advanced to support audio and video processing [39][43]. - The company faces challenges in agent capabilities, AI programming, and search functionalities, which are critical for maintaining competitiveness in the market [45][48][51]. Group 3: Memory and Learning Capabilities - Current AI models, including OpenClaw, have significant limitations in memory management, leading to issues with context retention and task continuity [18][30]. - Research indicates that many leading models struggle to learn effectively from context, highlighting a gap in their ability to utilize information dynamically [32][34]. - The development of a robust memory system within V4 could potentially transform how AI learns and interacts, making it more adaptable and user-friendly [30][35].
Optimus V2.5集体亮相,V3发布恐要推迟!
Robot猎场备忘录· 2026-03-16 00:02
温馨提示 : 点击下方图片,查看运营团队最新原创报告(共260页) 说明: 欢迎约稿、刊例合作、行业交流 , 行业交流记得先加入 "机器人头条"知识星球 ,后添加( 微信号:lietou100w )微信; 若有 侵权、改稿请联系编辑运营(微信:li_sir_2020); 正文: 多台Optimus V2.5首次同时亮相美国街头, Optimus V3要来了! 对于二级市场T链们而言,3月的催化点是Optimus V3亮相,关键点是V3表现要超预期;重点关注T链标的是1月 前往北美沟通标的们、2月新晋利好标的们(含审厂标的)和3月新晋利好标的们,其中着重关注待签署、已签署 ppa协议标的们 。 T链们走势: 回顾3月份第二个交易周(3月9日-13日),除了3月10日迎来一波上行行情外,其他走势皆较弱(下行);节后 以来,T链走势堪称"惨淡","中军们"持续萎靡,除了外围不确定因素外, 核心原因小编节前已提到 "3月是V3亮 相关键节点,V3亮相催化已是共识,亮相前的洗盘是必要环节"。 针对3月10日(周二)上行行情,原因小编在星球日盘后梳理业有提到"本波上行行情并没有特斯拉官方利好催 化, 因此更多来自于板块间 ...
暴力上涨的token背后是裁员
小熊跑的快· 2026-03-15 13:14
Al Model Rankings Based on real usage data from millions of users accessing models through OpenRouter. 00 Top Models Weekly usage of models across OpenRouter 18T 9T 4.5T 2025年3月17日 11月24日 7月21日 | | & LLM Leaderboard | This Week | | --- | --- | --- | | | MiniMax M2.5 | 1.82T tokens | | | by minimax | 10% | | 2. | Step 3.5 Flash (free) | 1.3T tokens | | | by stepfun | 193% | | 3. | Gemini 3 Flash Pre ... | 1.01T tokens | | | by google | J4% | | | DeepSeek V3.2 | 1.01T tokens | | | by deepseek | 125% | | | Cla ...
ByteDance suspends launch of video AI model after copyright disputes, The Information reports
Yahoo Finance· 2026-03-14 16:13
Core Viewpoint - ByteDance has paused the global launch of its AI video generator Seedance 2.0 due to copyright disputes with major Hollywood studios and streaming platforms [1] Group 1: Legal Issues - ByteDance is facing legal threats from U.S. studios, including Disney, regarding unauthorized use of intellectual property in Seedance 2.0 [2] - Disney accused ByteDance of using its characters to train Seedance 2.0 without permission, leading to a cease-and-desist letter [2][3] - ByteDance's legal team is actively working to identify and resolve potential legal issues related to the model [5] Group 2: Product Features and Market Position - Seedance 2.0 is designed for professional film, e-commerce, and advertising use, capable of processing text, images, audio, and video simultaneously to lower content production costs [3] - The model has garnered attention for its ability to generate cinematic storylines, drawing comparisons to competitors like DeepSeek [4] - ByteDance had planned to launch Seedance 2.0 globally in mid-March but has since suspended these plans [4]
英伟达豪掷260亿美元下场造AI模型,直接叫板OpenAI
硬AI· 2026-03-12 09:04
Core Viewpoint - Nvidia is transitioning from a hardware giant to a full-stack AI company by investing $26 billion over the next five years in developing open-source AI models, directly challenging the market positions of OpenAI, Anthropic, and DeepSeek [2][3][4]. Group 1: Investment and Strategic Shift - Nvidia's significant investment of $26 billion has been confirmed by company management, marking a strategic shift towards competing directly with top AI laboratories [3][4]. - The launch of the Nemotron 3 Super model, which boasts 128 billion parameters, signifies Nvidia's commitment to advancing its AI capabilities [6]. Group 2: Model Performance and Benchmarking - The Nemotron 3 Super achieved a score of 37 in the Artificial Intelligence Index, surpassing OpenAI's GPT-OSS score of 33, indicating its competitive performance in the AI model landscape [6]. - Nvidia's model participated in the PinchBench benchmark test, ranking first in evaluating control capabilities, further showcasing its advanced performance [6]. Group 3: Hardware and Software Integration - Nvidia's strategy involves a deep integration of hardware and software, with future AI models designed not only for chip development but also for optimizing supercomputing data center architectures [10]. - The open-source strategy is expected to foster a developer network around Nvidia's hardware ecosystem, enhancing market stickiness for its chips [10]. Group 4: Industry Reception and Significance - The research community has reacted positively to Nvidia's strategic move, with experts highlighting its milestone significance in the open-source AI landscape [12]. - Nvidia's investment is viewed as a historic statement of commitment to openness in AI, positioning the company at the forefront of both open and closed AI projects [12].
英伟达豪掷260亿美元下场造AI模型,直接叫板OpenAI
Hua Er Jie Jian Wen· 2026-03-12 08:02
Core Viewpoint - Nvidia plans to invest $26 billion over the next five years to develop open-source AI models, marking a strategic shift from being a hardware and software supplier to a full-stack AI company that competes directly with leading AI labs like OpenAI and Anthropic [1] Group 1: Investment and Strategic Shift - Nvidia's investment has been confirmed by company management and is aimed at developing open-source AI models [1] - The company has released its strongest open-source model, Nemotron 3 Super, which reportedly surpasses OpenAI's GPT-OSS in several benchmark tests [1] - This investment signifies a profound strategic shift for Nvidia, transitioning from a hardware supplier to a competitor in the AI model space [1] Group 2: Model Performance and Technical Innovations - The Nemotron 3 Super model features 128 billion parameters, comparable to the largest version of OpenAI's GPT-OSS, and scored 37 in the Artificial Intelligence Index, outperforming GPT-OSS's score of 33 [2] - Nvidia's model participated in a new benchmark test, PinchBench, where it ranked first in controlling OpenClaw [2] - The company has disclosed innovative training methods for the model, enhancing its reasoning, long-context processing, and reinforcement learning capabilities [2] Group 3: Hardware and Software Integration - Nvidia's strategy is not just about model competition but also about deeply integrating its hardware roadmap with AI model development [4] - Future AI models will optimize supercomputing data center architectures, stretching the capabilities of Nvidia's systems [4] - The open-source strategy aims to create a developer network around Nvidia's hardware ecosystem, enhancing market stickiness for its chips [4] Group 4: Industry Reception - The research community has reacted positively to Nvidia's strategic move, with notable figures calling it a milestone for open-source AI [6] - Experts emphasize the importance of government support for open-source models, highlighting Nvidia's investment as a significant statement of commitment to openness in AI [6]
养虾人狂吃国产模型!4.19万亿Token调用量激增34.9%超越美国
量子位· 2026-03-11 02:45
Core Insights - The article highlights the significant rise of Chinese large models in the AI sector, particularly during the recent weeks, showcasing their dominance over American counterparts in terms of usage and performance metrics [2][3][9]. Group 1: Performance Metrics - The total weekly usage of Chinese large models surged to 4.19 trillion tokens, marking a 34.9% increase, while American models saw a decline of 8.5% to 3.63 trillion tokens [6]. - In the following week, the usage of Chinese models reached 4.12 trillion tokens, surpassing the U.S. models for the first time, which dropped to 2.94 trillion tokens [9]. - By the week of March 16-22, the usage of Chinese models further increased to 5.16 trillion tokens, reflecting a 127% growth over three weeks, while U.S. models decreased to 2.7 trillion tokens [9]. Group 2: Leading Models - The top three models in usage were Kimi K2.5, Step 3.5 Flash, and MiniMax M2.5, each exceeding 1 trillion tokens [5][34]. - MiniMax M2.5 maintained a strong performance, consistently ranking at the top globally, while Step 3.5 Flash emerged as a significant contender [13][15]. - Chinese models dominated the global top five rankings, with three positions occupied by domestic products [12]. Group 3: Application and Context - The article emphasizes the popularity of the OpenClaw application among users, which has consumed a total of 9.16 trillion tokens since January, establishing itself as a major player in the market [32]. - In terms of context length usage, different models excelled in various token ranges, with MiniMax M2.5 and DeepSeek V3.2 being preferred for tasks requiring 10K-100K tokens [23][25]. Group 4: Competitive Landscape - The article notes that while Chinese models are gaining traction, they still need to improve in terms of speed and cost-effectiveness compared to leading models from Google and OpenAI [44]. - The PinchBench ranking, which evaluates models based on success rate, speed, and cost, indicates that while Chinese models like Kimi K2.5 and MiniMax M2.1 are performing well, they lag in speed compared to some competitors [39][41].
中国 AI 专家会议要点- 核心参与者战略重心分化_ China AI expert call takeaways_ diverging strategic focus among key players
2026-03-10 10:17
Summary of the Conference Call on China Internet Sector and AI Development Industry Overview - **Industry**: China Internet Sector, specifically focusing on AI development - **Key Players**: Major internet companies including ByteDance, Alibaba, Baidu, Tencent, and emerging AI labs like MiniMax, Zhipu AI, and others Core Insights and Arguments Diverging Strategies - Internet leaders are focusing on the domestic market in China, integrating AI chatbots into consumer-facing super apps to create new traffic gateways [2][3] - Emerging AI labs are prioritizing enterprise services and international markets, with a focus on AI agent products like OpenClaw, leveraging model performance and cost advantages [2][4] Model Capabilities - **ByteDance**: Strong multimodal capabilities and rapid model iteration, utilizing data from Douyin/TikTok [3] - **Alibaba**: Integrates AI models within its ecosystem for both enterprise and consumer users, exemplified by DingTalk and Qwen [3] - **Baidu**: Notable for its reasoning capabilities and cost efficiency, particularly in finance and healthcare sectors [3] - **Tencent**: Gradual AI rollout with diverse applications across social and gaming ecosystems [3] Monetization Challenges - Internet leaders are expected to focus on consumer-facing traffic gateways rather than immediate monetization through advertising or subscriptions, as market consolidation may take time [3] Emerging AI Labs - **MiniMax**: Strong consumer user base and efficiency in enterprise business [4] - **Zhipu AI**: Generates significant revenue from local model deployment for business and government clients [4] - **GLM 5.0**: Achieved improvements in coding capabilities [4] - **Moonshot's Kimi**: Transitioning from consumer chatbots to enterprise services [4] - **DeepSeek**: Innovation-driven, focusing on advancing model capabilities towards AGI [4] Investment Opportunities - **MiniMax**: Initiated coverage, positioned to benefit from AI trends in China and globally [5] - **Alibaba and Baidu**: Favorable due to their full-stack AI capabilities [5] - **Tencent and Kuaishou**: Potential in AI applications noted [5] Risks Identified - Key risks to the sector include: 1. Evolving competitive landscape and intensifying competition [7] 2. Rapid technological changes and shifting user preferences [7] 3. Uncertain monetization strategies [7] 4. Rising costs associated with traffic acquisition and content promotion [7] 5. Maintenance of IT systems [7] 6. Challenges in international market expansion [7] 7. Regulatory changes impacting market sentiment [7] Additional Important Points - The report emphasizes the importance of understanding the evolving dynamics within the AI sector and the competitive strategies of both established internet companies and emerging AI labs [2][4] - The potential for AI to enhance production efficiency in various sectors, including video generation and gaming, is highlighted as a long-term market opportunity [2]
X @Cointelegraph
Cointelegraph· 2026-03-10 03:00
🔥 UPDATE: ChatGPT is the most used AI in most countries, except in China, where DeepSeek dominates, per Andreessen Horowitz. https://t.co/YscyeXuz20 ...