DeepSeek

Search documents
DeepSeek致谢腾讯技术团队:对DeepEP的优化,是一次“huge speedup”代码贡献
Xin Lang Ke Ji· 2025-05-07 11:12
Core Insights - Tencent's technical team has optimized the DeepEP communication framework, achieving significant performance improvements across various network environments, with a 100% performance increase in RoCE networks and a 30% increase in IB networks, enhancing AI large model training efficiency [1][2] Group 1: Technical Enhancements - The optimization involved replacing IBRC with IBGDA and utilizing distinct Queue Pairs (QPs) per channel for parallel data transmission, which improved the robustness and communication performance of the normal kernels [1] - The algorithm bandwidth for the optimized framework reached 58 GB/s in RDMA scenarios, with physical bandwidth calculated at 43.5 GB/s [1] Group 2: Industry Impact - Since the open-sourcing of DeepSeek, including DeepEP, in February, the framework has demonstrated a 300% increase in communication efficiency, addressing the dependency on NVIDIA NCCL for MoE architecture large models [2] - The optimizations have been successfully applied in Tencent's mixed Yuan model projects, showcasing excellent versatility in high-performance environments built with Tencent's Starry Network and H20 servers [2]
梁文锋和杨植麟再“撞车”
创业家· 2025-05-07 09:57
Core Viewpoint - The article discusses the competitive landscape in the AI large model sector, focusing on the advancements and challenges faced by companies DeepSeek and Kimi, as well as the impact of larger players like Alibaba and Baidu on their market positions [2][5][13]. Group 1: Model Developments - DeepSeek launched its new model, DeepSeek-Prover-V2, with a parameter scale of 671 billion, significantly larger than the previous version's 7 billion, resulting in improved efficiency and accuracy in mathematical tasks [3][4]. - The performance of DeepSeek-Prover-V2 in the miniF2F test reached 88.9%, while it solved 49 problems in the PutnamBench test, outperforming Kimi's model, which had an 80.7% pass rate and solved 10 problems [3][4]. - The evolution of DeepSeek's models is synchronized, with a timeline of updates from Prover series models starting in March 2024 to the latest updates in 2025 [8][9]. Group 2: Competitive Landscape - DeepSeek and Kimi are facing increasing competition from major companies like Alibaba and Baidu, which are rapidly advancing their own AI models [5][15]. - Alibaba's new model, Qwen3, is described as a "mixed reasoning model" that outperforms DeepSeek's R1 model despite having only one-third of its parameters [15][16]. - Kimi has seen rapid growth, reaching 20 million monthly active users within a year, but is now being challenged by Tencent's Yuanbao, which has surpassed Kimi in user numbers [14][15]. Group 3: Future Directions - DeepSeek's founder has identified three paths for achieving AGI: mathematics and code, multimodal learning, and natural language [7]. - The upcoming R2 model is anticipated to enhance DeepSeek's capabilities, with expectations of a shorter development cycle compared to the more extensive updates expected for the V4 model [9][10]. - The market is eager for DeepSeek's new models, with speculation about the use of Huawei's Ascend chips for R2, although there are concerns about their robustness for large model development [10][11].
【产业互联网周报】阿里通义再失大将:鄢志杰、薄列峰三个月内相继离职;欧盟对TikTok处以5.3亿欧元罚款;英伟达:中国特供版GPU将6月上市
Tai Mei Ti A P P· 2025-05-07 09:00
Financial Performance - Palantir's Q1 revenue surged by 39% to $884 million, exceeding analyst expectations of $863 million, with adjusted EBITDA of $397.3 million, surpassing the forecast of $371 million [2] - Amazon's Q1 net sales reached $155.67 billion, a 9% increase year-over-year, with AWS revenue at $29.27 billion, growing 17% but below expectations, leading to a stock drop of over 3% [3] - Microsoft's Q3 revenue hit $51.87 billion, driven by cloud and AI, with Microsoft Cloud revenue at $42.4 billion, a 20% increase, and Azure growth at 33% [4] Industry Developments - Xiaomi announced the open-sourcing of its first inference model, Xiaomi MiMo, which outperformed OpenAI's o1-mini in mathematical reasoning and coding assessments [8] - DeepSeek released the Prover-V2 model with 671 billion parameters, utilizing a more efficient file format and supporting various computational precisions [5] - Huawei launched an AI data lake solution to enhance model training and inference efficiency [6] Corporate Strategies - Tencent restructured its TEG division, creating new departments for large language models and multimodal models, aiming to improve efficiency and reduce resource waste [12] - Ant Group plans to separately list its overseas segment, Ant International, in Hong Kong, with revenues accounting for about 20% of the group's total [10] - OpenAI is reportedly acquiring AI coding tool Windsurf for approximately $3 billion, marking its largest acquisition to date [14] Market Trends - The MaaS and AI large model solutions market in China is projected to grow significantly, reaching 710 million yuan in 2024, a 215.7% increase from 2023 [22] - China's AI industry is expected to surpass 700 billion yuan in 2024, maintaining a growth rate of over 20% [23] - The new version of the National Intelligent Manufacturing Standard System Construction Guide emphasizes the integration of AI with manufacturing [24]
计算机行业周报:DeepSeek-Prover-V2创数学推理新高,阿里通义千问推出Qwen3模型
Huaxin Securities· 2025-05-07 08:23
Investment Rating - The report maintains a "Buy" rating for several companies in the AI and computing sector, including 亿道信息 (Yidao Information), 科大讯飞 (iFlytek), 唯科科技 (Weike Technology), 泓淋电力 (Honglin Electric), 嘉和美康 (Jiahe Meikang), 寒武纪 (Cambricon), 鼎通科技 (Dingtong Technology), and 迈信林 (Maixinlin) [15][50]. Core Insights - The computing industry has shown a strong relative performance, with a 1-month return of 14.6% compared to the Shanghai Composite Index's 6.1% [2]. - The launch of DeepSeek-Prover-V2 marks a significant advancement in mathematical reasoning models, achieving state-of-the-art performance in neural theorem proving [4][21]. - The Qwen3 model from 阿里通义千问 (Ali Tongyi Qianwen) has been introduced, showcasing competitive results in various benchmarks and significantly increasing its pre-training dataset size [6][30]. Summary by Sections 1. Computing Dynamics - The rental prices for computing power remain stable, with specific configurations priced at 28.64 RMB/hour for Tencent Cloud and 31.58 RMB/hour for Alibaba Cloud for A100-40G setups [20]. - DeepSeek-Prover-V2 was released on April 30, achieving advanced performance levels in theorem proving, solving 6 out of 15 selected problems from the AIME competition [21][22]. 2. AI Application Dynamics - Gemini's average stay duration increased by 3.45%, indicating growing user engagement [26]. - The Qwen3 model supports two thinking modes, allowing for both deep reasoning and quick responses, enhancing user flexibility [28]. 3. AI Financing Trends - Persona Identities Inc. completed a $200 million Series D funding round, reaching a valuation of $2 billion, highlighting the growing demand for AI-driven identity verification solutions [34][36]. 4. Market Review - The AI computing index and AI application index showed fluctuations, with notable gains in specific companies like 天源迪科 (Tianyu Dike) and 鸿博股份 (Hongbo Shares) [39][45]. 5. Investment Recommendations - The report suggests focusing on companies like 嘉和美康 (Jiahe Meikang) and 科大讯飞 (iFlytek) for potential growth, driven by advancements in AI and computing technologies [48][49].
万字长文带你读懂强化学习,去中心化强化学习又能否实现?
机器之心· 2025-05-07 04:34
选自 Symbolic Capital 作者:Sam Lehman 机器之心编译 AI / 机器学习 scaling 简史 (极简版) 强化学习(RL)是当今 AI 领域最热门的词汇之一。近日,一篇长文梳理了新时代的强化学习范式对于模型提升的作用,同时还探索了强化学习对去中心化的意 义。 原文地址:https://www.symbolic.capital/writing/the-worlds-rl-gym 「有时候几十年什么也不会发生;有时候几周时间仿佛过了几十年。」这句话形容当今的现代 AI 领域最为贴切。似乎每天都有新的突破性模型、训练方法或公司 涌现,迫使我们重新思考 AI 世界的可能性。今年早些时候是 DeepSeek,接下来是星际之门项目,现在还有 Qwen、Manus、MCP 等。谁知道接下来会发生什么? 目前,在打造更好的模型方面,通过预训练以及最近的测试时间计算进行 scaling 是引领性方法。但最近,随着 DeepSeek-R1 和 R1-Zero 的发布,人们开始更加亲 睐一种不同的模型 scaling 方法 —— 强化学习(RL)。本文的目标是探索基于 RL 的模型改进的含义,并会特别 ...
外资LP正视“东升西落”
3 6 Ke· 2025-05-07 01:38
在全球经济格局深刻调整的当下,"东升西落"不再仅是中国内部的战略共识,而是逐渐成为全球资本市 场的现实写照。美国对中国发起的关税战不仅未能遏制中国的发展,反而在全球范围内引发了对美国经 济政策的不信任。与此同时,美债收益率的飙升和美股高估值的持续,使得美元资产的信用风险日益凸 显。在这样的背景下,中国以其稳健的经济增长和日益完善的投资环境,成为全球资本寻求避风港的新 选择。外资LP们开始悄然布局中国市场,寻找新的增长点和投资机会。这一趋势不仅体现了全球资本 对中国市场的重新评估,也标志着中国在全球资本流动中的角色正在发生深刻变化。 外资重新"认识"中国 今年4月以来美国对全球,尤其是中国,发起的关税战引发了广泛的国际抵制。曾被视为"避风港"的美 元资产,如今正面临信任危机。美债收益率飙升、美股高估值泡沫、以及对华关税战的持续升级,使得 投资者开始重新审视美国资产的安全性。与此同时,中国市场以其稳定的政策环境和科技创新能力,逐 渐成为全球资本的避风港。 美国国债市场一直被视为全球最安全的投资标的。然而,美国联邦债务总额激增至36.2万亿美元,其中 约9.2万亿美元将在2025年到期,占GDP的近31% 。这一庞 ...
谷歌突发大招刷爆AI编程榜!网友:不用买Cursor了
量子位· 2025-05-07 01:09
Core Viewpoint - The article discusses the early release of Gemini 2.5 Pro Preview, highlighting its advancements in coding capabilities and its performance across various AI arenas, particularly in text, visual, and web development tasks [1][15][21]. Group 1: Model Performance - Gemini 2.5 Pro Preview has achieved the highest ranking in all LMArena leaderboards, surpassing Claude in all text, visual, and WebDev categories [4][5]. - The model's score in the WebDev Arena is 1448, which is an increase of 147 points compared to previous versions [6][7]. - The model has received widespread acclaim for its coding and multi-modal reasoning capabilities, indicating a significant improvement in performance [16][18]. Group 2: New Features and Applications - The update allows users to create applications from simple prompts, such as transforming sketches into audio or generating interactive learning applications from YouTube videos [2][10]. - New functionalities include the ability to replicate styles with a single prompt, enhancing user interface design processes [12][13]. - Developers can utilize the updated Gemini 2.5 Pro through Google AI Studio and Vertex AI, making it accessible for building new applications [14]. Group 3: Market Impact and Reception - The early release of Gemini 2.5 Pro was prompted by its popularity, originally intended for a later announcement at the Google I/O conference [15][20]. - The model's success is seen as a signal of change in the competitive landscape of AI, with Google making significant strides in the field [21][22].
国产AI芯片获热捧:推理需求爆发,产业链解题效率提升
2 1 Shi Ji Jing Ji Bao Dao· 2025-05-06 13:04
Core Insights - The demand for AI inference is driving significant growth in the performance of domestic AI chip companies, with notable improvements in financial results for key players like Cambrian and Haiguang [1][2][11] - Cambrian has ended six consecutive years of losses, achieving profitability in Q4 2024 and continuing this trend into Q1 2025, with a substantial revenue increase [2][6] - Haiguang's performance remains stable, with a strong revenue growth driven by innovations in general computing products [11][14] Cambrian's Performance - Cambrian reported a revenue of 1.174 billion yuan in 2024, a year-on-year increase of 65.56%, and a net loss of 452 million yuan, which is a 46.69% reduction in losses compared to the previous year [2] - In Q1 2025, Cambrian's revenue surged to 1.111 billion yuan, a 40-fold increase year-on-year, although the quarter-on-quarter growth showed a decline [6] - The company achieved a net profit of 355 million yuan in Q1 2025, marking a 256.39% increase compared to a loss of 227 million yuan in the same quarter of the previous year [7] Haiguang's Stability - Haiguang achieved a revenue of 9.162 billion yuan in 2024, a 52.4% increase year-on-year, with a net profit of 1.931 billion yuan, up 52.87% [11] - In Q1 2025, Haiguang's revenue was 2.4 billion yuan, a 50.76% increase year-on-year, and a net profit of 506 million yuan, reflecting a 75.33% increase [11][14] - The growth is attributed to continuous technological innovation and an expanding market share in general computing products [11][15] Challenges in Specialized Markets - Companies focusing on specialized markets, such as Jingjiawei and Loongson Zhongke, are facing performance pressures, with Jingjiawei's revenue declining by 34.62% in 2024 [17] - Loongson Zhongke reported a slight revenue decrease of 0.28% in 2024, with a net loss of 625 million yuan, indicating challenges in building its ecosystem [20] - Both companies are attempting to transition from specialized to more general market applications to enhance growth [17][20] Industry Trends and Innovations - The rapid adaptation of AI chip manufacturers to DeepSeek's models is seen as a significant step towards internationalization and enhancing the domestic AI chip ecosystem [22][23] - The introduction of integrated computing products is gaining traction, with over a hundred models available, although there are concerns about their performance consistency [24] - Innovations in computing efficiency and transmission rates are being pursued, exemplified by Huawei's new CloudMatrix architecture, which significantly enhances resource interconnect bandwidth [26]
中国 AI 投资人:练习时长两年半
Founder Park· 2025-05-06 12:05
AI时代下的数智链主:趋势与展望
Sou Hu Cai Jing· 2025-05-06 08:28
Core Insights - The competition among digital chain leaders is inherently global, driven by the rapid advancement of AI and smart technologies, which are disrupting traditional chain leaders [2][3] - Digital and intelligent transformation is becoming a new trend in global production networks, with the potential to revolutionize human production and lifestyle [2][3] - The emergence of digital chain leaders, or "smart chain leaders," is crucial as they integrate material and data through AI, enhancing production capabilities and decision-making intelligence [3][5] Group 1: Impact of AI on Traditional Chain Leaders - The acceleration of intelligent transformation is leading to the replacement of traditional chain leaders, with smart chain leaders striving to be the first to achieve large-scale AI practical application [5][6] - The historical context shows that once AI surpasses certain thresholds, it can lead to disruptive changes across industries, as seen in examples like the evolution of Go and the automation of parking systems [6][7] - The urgency for businesses to embrace AI is palpable, with a growing anxiety among entrepreneurs to understand and leverage AI technologies [7][8] Group 2: Differentiation Between Digitalization and Intelligentization - Digitalization is recognized for its potential to enhance efficiency, but its benefits are often indirect and limited, while intelligentization can dramatically improve production efficiency [8][9] - The competition among smart chain leaders is global, as breakthroughs in intelligentization can lead to significant productivity gains, posing existential threats to traditional chain leaders [8][9] Group 3: Technical Routes and Responsibilities of Smart Chain Leaders - The debate over AI's development routes—AI hegemony versus AI equality—highlights the importance of smart chain leaders in driving industry-specific AI applications [9][10] - Smart chain leaders must undertake deep digitalization to align with intelligentization needs, moving beyond superficial digital efforts to detailed process digitization [12][13] - They also need to adapt to rapid AI iterations, engaging in a continuous learning process to remain competitive [13][14] Group 4: Long-term Process of Societal Digitalization - The journey towards societal digitalization is expected to be lengthy, with significant industry reshuffling akin to the impact of the internet on various sectors [15] - The development of general artificial intelligence (AGI) and industry-specific AI applications are critical areas for future focus, requiring collaboration among industry players to establish smart chain leaders [15]