Workflow
RNL技术
icon
Search documents
联想提出RNL技术,通过多维感知等解决AI训练中的难题
Xin Lang Ke Ji· 2025-11-28 11:09
未来,联想计划将RNL技术扩展至高性能存储、HPC等场景,并引入深度学习算法优化拥塞预测能力。 同时,联想将在千卡、万卡节点的大型AI集群中验证其综合性能,持续推动AI网络技术的创新与迭 代。 新浪科技讯 11月28日晚间消息,近日,联想万全异构智算研发团队的论文被IEEE CyberSciTech 2025大 会接收,并即将收录于IEEE DL和EI Indexed。此次联想提出了一项创新性的RNL技术,通过多维感 知、路径负载均衡优化与增量流量迁移,有效解决了AI训练与推理场景中RoCE网络负载均衡的长期难 题。 随着大语言模型参数规模爆发式增长,AI集群规模不断扩大,RoCEv2(RDMA over Converged Ethernet v2)已成为AI网络的主流协议。然而,AI训练与推理基于通信原语(如all-gather、all-reduce)进行数据 传输,这种模式容易导致网络流量呈现"低熵、大象流"特征,极易引发负载不均和链路拥塞,严重制约 带宽利用率与整体性能。 联想方面表示,针对上述痛点,团队提出了RNL技术,可以构建"多维感知+路径负载均衡+增量迁 移"闭环体系,兼具算法创新与实用价值:首 ...
联想万全异构智算研发团队论文被IEEE CyberSciTech 2025收录
Huan Qiu Wang· 2025-11-28 09:37
Core Insights - Lenovo's RNL technology addresses long-standing challenges in RoCE network load balancing for AI training and inference scenarios, showcasing innovation in multi-dimensional perception, path load balancing optimization, and incremental flow migration [1][2]. Group 1: RNL Technology Overview - The RNL technology integrates multi-dimensional perception, path load balancing optimization, and incremental flow migration into a closed-loop system, providing both algorithmic innovation and practical value [1]. - The multi-dimensional perception mechanism allows real-time awareness of network topology, AI task network demands, and RoCE link load status, forming a data foundation for dynamic scheduling [1]. - Path load balancing optimization employs virtual-physical network mapping and path scoring algorithms to intelligently select optimal data transmission paths, maximizing bandwidth utilization [1]. Group 2: Performance and Cost Efficiency - RNL technology demonstrates high reliability and dual advantages in enhancing AI business efficiency and reducing total cost of ownership (TCO) [2]. - Performance improvements include a 50% enhancement in communication primitive performance, 85% bandwidth utilization, and a 90% reduction in load balancing discreteness [2]. - In AI inference scenarios, transactions per second (TPS) increased by 26%, time to first byte (TTFT) decreased by 30%, and time per output token (TPOT) reduced by 22%, while overall deployment costs were lowered by 60% [2]. Group 3: Strategic Implications - RNL technology is incorporated into Lenovo's heterogeneous computing platform, reinforcing its technological barriers in the AI heterogeneous computing market and enhancing its industry influence and core competitiveness [4].