DualPath
Search documents
DeepSeek又一论文上新!新模型V4更近了?
Di Yi Cai Jing· 2026-02-27 07:01
论文延续DeepSeek一贯的风格,在工程化层面将性能优化推向极致。 在业界对新一代旗舰模型DeepSeek V4的翘首期盼中,DeepSeek团队却悄然放出了一篇新的学术论文。 这篇论文由DeepSeek联合北大、清华共同撰写,将研究方向投向了决定大模型实际应用落地的关键一环——推理速度,为日益复杂的AI智能体,提供一套 高效的底层系统解决方案。 论文在引言部分提到,大模型正从单轮对话机器人和独立推理模型,快速演进为智能体系统 ——能够自主规划、调用工具,并通过多轮交互解决实际任 务。这种应用范式的转变,推动大模型推理工作负载发生重大变革:从传统的人类-大模型交互,转向人类-大模型-环境交互,交互轮次可达数十甚至数百 轮。 上下文会跨轮次累积,最终长度可能达到极值。此时模型不需要大量计算,反而需要频繁从硬盘读取历史上下文的 KV-Cache;现有系统中,只有负责预处 理的引擎会读取KV-Cache,它的网卡带宽被占满,而负责生成内容的解码引擎,网卡带宽基本闲置,导致整个系统速度被卡脖子。 因此,论文提出的DualPath,针对智能体工作负载、重新设计现代推理架构中 KV-Cache加载逻辑,解决大模型做智能 ...
美股软件龙头大涨,高盛:软件反弹潮未止!拓维信息涨停,软件ETF汇添富(159590)大涨超2%!黄仁勋重磅发声
Xin Lang Cai Jing· 2026-02-27 05:30
Group 1 - Salesforce, a leading software company, saw its stock rise over 4% after its earnings report, contributing to a strong rebound in the A-share software sector, with the software ETF Huatai (159590) increasing by over 2% and trading volume exceeding 50 million yuan [1][2] - Major component stocks of the software ETF showed positive performance, with Shunwang Technology rising over 11%, Tuo Wei Information hitting the daily limit, and Runhe Software increasing by over 6% [3] - The trading data for key stocks in the software sector indicates significant gains, with Tuo Wei Information at 10% increase and a transaction volume of 3.037 billion yuan, while Runhe Software and Shunwang Technology also reported substantial trading volumes [4] Group 2 - Multiple institutions believe that the recent decline in the software industry was excessive, and a rebound is likely to continue. Main Street Research's CIO noted that the software sector's sell-off has reached a bottom, while Goldman Sachs indicated that the recent rebound could persist despite high short-selling levels [5] - The AI model usage in China has surged, surpassing that of the U.S. for the first time, with a record 4.12 trillion tokens called in a week, indicating a strong growth momentum in the Chinese AI sector [6] - HSBC's report titled "Software Will Eat AI" argues that software will not be threatened by AI but will instead be the key means for large enterprises to leverage AI effectively. The report emphasizes that traditional software giants will lead in developing the best AI software due to their deep domain expertise and established customer trust [7][8] Group 3 - HSBC predicts that 2026 will mark the beginning of significant monetization in the software industry, which is currently undervalued. The report suggests that the total addressable market for software is on the verge of a large-scale expansion cycle [8] - Zhongyou Securities anticipates that AI agents will become a crucial commercial application of large models, with significant deployments across various industries, indicating a shift towards specialized applications in vertical fields [9] - Dongfang Securities acknowledges the rationality behind concerns that AI models may disrupt the software industry but suggests a "K" shaped differentiation in the future, where software with unique data resources will be less threatened compared to horizontal software lacking such advantages [10]
DeepSeek新论文剧透V4新框架,用闲置网卡加速智能体推理性能,打破PD分离瓶颈
3 6 Ke· 2026-02-27 02:29
Core Insights - A new reasoning framework for agents called DualPath has been introduced, which addresses I/O bottlenecks in long-text reasoning scenarios by optimizing the speed of loading KV-Cache from external storage [1][3]. Group 1: DualPath Framework - DualPath changes the traditional Storage-to-Prefill loading mode by introducing a second path, Storage-to-Decode, allowing for more efficient data handling [3][6]. - The framework utilizes idle storage network interface card (SNIC) bandwidth from the decoding engine (DE) to read caches and employs high-speed computing networks (RDMA) to transfer data to the prefill engine (PE), achieving global pooling of storage bandwidth and dynamic load balancing [3][13]. Group 2: Performance Improvements - In tests with a production-level model of 660 billion parameters, DualPath demonstrated a remarkable increase in offline inference throughput by 1.87 times and an average increase in online service throughput by 1.96 times [3][14]. - The framework significantly optimizes first token latency (TTFT) under high load while maintaining stable token generation speed (TPOT) [5][14]. Group 3: Technical Innovations - DualPath allows KV-Cache to be loaded into the decoding engine first, which is then transmitted to the prefill engine, alleviating bandwidth pressure on the prefill side [7][9]. - The architecture includes a central scheduler that dynamically allocates tasks based on I/O pressure and computational load, preventing congestion on any single network interface or computational resource [14][18]. Group 4: Research and Development - The first author of the paper, Wu Yongtong, is a PhD student at Peking University, focusing on system software and large model infrastructure, particularly in optimizing inference systems for large-scale deployment [15][16].