Seek .-全面适配！京东云将DeepSeek推理场景性能提升50%

Core Insights - DeepSeek's five core technologies (FlashMLA, DeepEP, DeepGEMM, DualPipe & EPLB, 3FS file system) were showcased during a five-day "Open Source Week," achieving significant global attention [1] - JD Cloud announced full-stack adaptation of these technologies, resulting in a 50% performance improvement in inference scenarios [1][2] Group 1: Technology Enhancements - Flash MLA optimizes GPU memory and computational resources, addressing resource wastage in traditional methods for processing variable-length sequences [1] - The vGPU AI computing platform supports Flash MLA's FP8 format, reducing single Token's KV Cache memory usage by 57 times compared to Multi-head Attention, ensuring high throughput and low latency under high concurrency [1] Group 2: Communication and Performance - JD Cloud's vGPU AI computing platform fully supports distributed inference using the DeepEP communication library, significantly enhancing inference throughput [2] - By integrating DeepEP, JD Cloud utilizes NVLink for intra-machine communication and NVSHMEM for inter-machine communication, improving GPU resource utilization and reducing performance bottlenecks [2] Group 3: Local Deployment and Adaptation - JD Cloud has assisted multiple local governments in deploying DeepSeek based on existing infrastructure, allowing local enterprises to access the service without resource investment [3] - The platform has achieved comprehensive domestic chip adaptation, ensuring self-control from foundational computing to large model applications, including over ten domestic AI computing solutions [2]