推理性能优化
Search documents
优刻得与清程极智达成战略合作
Zheng Quan Shi Bao Wang· 2025-12-16 10:13
人民财讯12月16日电,据优刻得消息,12月16日下午,优刻得与清程极智举行战略合作签约协议。此次 合作,双方将围绕国产算力资源整合、推理性能优化、模型服务平台建设等方向展开深度协同。优刻得 将结合清程极智在推理引擎与系统软件方面的技术优势,对国产算力集群进行联合优化,提升算力利用 率与推理性价比,推动闲置国产算力资源转化为稳定、可规模化交付的AI推理能力。同时,双方也将 探索在模型聚合与服务平台上的合作,持续引入更多主流模型,面向企业与开发者提供更具性价比的 Token服务与模型调用体验。 ...
腾讯、华为、微软、阿里专家齐聚一堂,共谈推理优化实践 | AICon
AI前线· 2025-04-23 07:28
Core Viewpoint - The article emphasizes the rapid evolution of artificial intelligence and the critical role of optimizing inference performance in large models to address computational challenges, memory bottlenecks, and communication pressures [1]. Summary by Sections Inference Performance Optimization - Current optimization efforts focus on three main areas: model optimization, inference acceleration, and engineering optimization. Techniques such as model quantization, pruning, and distillation are employed to reduce computational complexity and enhance inference efficiency [1]. - The DeepSeek-R1-Distill-Qwen-32B model utilizes a distillation strategy to significantly compress resource expenditure while maintaining high performance [1]. AICon Conference - The AICon global AI development and application conference will take place on May 23-24, featuring a special forum on "Strategies for Optimizing Inference Performance of Large Models," led by industry practitioners [1][10]. Expert Presentations - **Xiang Qianbiao - Tencent**: His presentation will cover the AngelHCF inference acceleration framework, detailing its comprehensive exploration in operator design, communication optimization, and architecture adjustments, achieving significant cost and performance advantages [1][2]. - **Zhang Jun - Huawei**: He will discuss the optimization practices of Huawei's Ascend AI framework, focusing on hybrid model advantages, kernel optimization, and strategies for ultra-large MoE models to alleviate communication bottlenecks [3][4]. - **Jiang Huiqiang - Microsoft**: His talk will address efficient long-text methods centered around KV caching, exploring challenges and strategies in the inference process [5][7]. - **Li Yuanlong - Alibaba Cloud**: He will present on cross-layer optimization practices in large model inference, discussing operator fusion, model quantization, and dynamic batching techniques to maximize hardware resource efficiency [6][8]. Technical Trends and Future Directions - The article highlights the importance of understanding the full lifecycle of KV caching and its impact on long-text processing, as well as the need for comprehensive optimization strategies from model architecture to hardware acceleration [7][8]. - The conference will also explore collaborative optimization strategies and the future landscape of inference performance enhancement, including model parallelism and hardware selection [10].