Workflow
AI大模型训练
icon
Search documents
提升大模型通信性能30% DeepSeek致谢腾讯大模型网络提速技术方案贡献
Shen Zhen Shang Bao· 2025-05-11 22:32
Core Insights - Tencent's technical team has optimized the DeepEP communication framework, achieving significant performance improvements in various network environments, with a 100% enhancement in RoCE and a 30% enhancement in IB networks, facilitating more efficient AI large model training solutions [2][3] - The optimization addresses key bottlenecks in the original DeepEP framework, particularly in bandwidth utilization and CPU control delays, which were limiting its broader application [2][3] Group 1 - The optimization includes intelligent bandwidth allocation through topology-aware multi-QP chaining technology, ensuring full utilization of dual-port network card bandwidth and preventing bandwidth waste [3] - Tencent has resolved CPU control bottlenecks in GPU communication by optimizing the control plane operations to bypass CPU intermediaries, reducing latency and energy consumption [3] - A new "QP internal sequencing lock" mechanism has been introduced to ensure accurate and sequential data transmission among multiple GPUs, even when handling over 1,000 simultaneous data transfer tasks [3] Group 2 - The optimized DeepEP framework has been fully open-sourced and successfully applied in Tencent's mixed Yuan large model training and inference projects, demonstrating excellent versatility in high-performance environments built with Tencent's Xingmai and H20 servers [3]
DeepSeek致谢腾讯技术团队:对DeepEP的优化,是一次“huge speedup”代码贡献
Xin Lang Ke Ji· 2025-05-07 11:12
Core Insights - Tencent's technical team has optimized the DeepEP communication framework, achieving significant performance improvements across various network environments, with a 100% performance increase in RoCE networks and a 30% increase in IB networks, enhancing AI large model training efficiency [1][2] Group 1: Technical Enhancements - The optimization involved replacing IBRC with IBGDA and utilizing distinct Queue Pairs (QPs) per channel for parallel data transmission, which improved the robustness and communication performance of the normal kernels [1] - The algorithm bandwidth for the optimized framework reached 58 GB/s in RDMA scenarios, with physical bandwidth calculated at 43.5 GB/s [1] Group 2: Industry Impact - Since the open-sourcing of DeepSeek, including DeepEP, in February, the framework has demonstrated a 300% increase in communication efficiency, addressing the dependency on NVIDIA NCCL for MoE architecture large models [2] - The optimizations have been successfully applied in Tencent's mixed Yuan model projects, showcasing excellent versatility in high-performance environments built with Tencent's Starry Network and H20 servers [2]
技术驱动与绿色转型双轮并进,润泽科技一季报稳健增长
Core Insights - The company reported a revenue of 1.198 billion yuan and a net profit of 430 million yuan for Q1 2025, indicating healthy financial metrics [1] - As a leading provider of intelligent computing infrastructure in China, the company is leveraging technological innovation and green development to build a future-oriented computing foundation [1] - The company has established seven AIDC intelligent computing clusters across key economic regions, with all delivered and upcoming computing centers having secured production orders, expected to be operational by 2025 [1] Technological Developments - The company is deepening the commercialization of liquid cooling technology, having delivered the industry's first fully liquid-cooled green computing center in 2023 [1] - The Power Usage Effectiveness (PUE) of the liquid-cooled computing centers has been reduced to approximately 1.15, showcasing significant energy efficiency [1] - The company is enhancing energy-saving renovations in existing computing centers and has achieved industry-leading PUE levels in its Langfang park, supporting AI model training with reliable and efficient computing infrastructure [1] Green Development Strategy - The company is actively promoting a "low-carbon green" process for its computing centers, with its A-7 and A-18 centers recognized as national green data centers due to their excellent energy-saving performance [2] - In 2024, the company completed a total of 800 million kilowatt-hours in green electricity transactions, emphasizing its commitment to energy-saving technology research and green transformation [2] Strategic Expansion - The company's strategic layout in Hainan Free Trade Port aligns with national policies, as the State Council approved the establishment of cross-border e-commerce comprehensive pilot zones in Hainan and other cities [3] - The company is constructing an intelligent computing infrastructure cluster in Danzhou, Hainan, with a planned capacity of approximately 30,000 cabinets, aimed at enhancing cross-border operations [3] - This initiative supports the digital economy development directive outlined in the Hainan Free Trade Port construction plan and lays the groundwork for the company to expand into overseas markets [3]