细粒度混合精度框架

Search documents
V3.1适配了国产FP8 精度芯片
小熊跑的快· 2025-08-22 01:12
Core Viewpoint - The successful implementation of the deepseek R1 is attributed to the use of the FP8 data format within a fine-grained mixed precision framework, which allows most compute-intensive operations to be performed at FP8 precision while retaining original data formats for a few critical operations [1]. Group 1 - The previous libraries were optimized based on CUDA, giving NVIDIA cards an advantage, while domestic chips only supported FP16, resulting in a 37% efficiency loss when using R1 [1]. - The recent adaptation of FP8 for domestic chips is expected to reduce costs and benefit local hardware [1]. - Future advancements in both software and hardware are anticipated, with NVIDIA's B cards potentially lowering precision to FP4, while several domestic companies are expected to support native FP8 in their next-generation products [2]. Group 2 - The focus on low-cost solutions is aimed at expanding into global markets [3].