NVFP4

Search documents
腾讯研究院AI速递 20250828
腾讯研究院· 2025-08-27 16:01
2. NVFP4通过微块缩放、E4M3高精度块编码、哈达玛变换和随机舍入等技术,成功解决低精度训练中的动态范围、 梯度波动性和数值稳定性问题; 生成式AI 一、 英伟达NVFP4,大模型训练根本性转变,效率狂飙7倍 1. 英伟达推出NVFP4新格式,能以4-bit精度实现16-bit训练精度, 或 改变LLM开发方式,在Blackwell Ultra上性 能较Hopper架构提升7倍; 3. 英伟达已与AWS、谷歌云、OpenAI等领先机构合作,在实验中证明NVFP4能在万亿级令牌规模下实现稳定收敛, 节省大量算力和电力成本。 https://mp.weixin.qq.com/s/FFspAOcdW6Ro0UsOasny-Q 二、 Gemini 2.5 Flash正式上线:单图比OpenAI便宜95% 1. 谷歌正式发布图像生成模型gemini-2.5-flash-image-preview(此前代号nano banana),拥有SOTA图像生成 与编辑能力、出色的角色一致性和极快速度; 2. 该模型支持32k上下文,可生成/编辑图像,每张成本仅约0.28元(0.039美元),比OpenAI便宜95%,已在 ...
DeepSeek刚提到FP8,英伟达就把FP4精度推向预训练,更快、更便宜
机器之心· 2025-08-27 10:40
机器之心报道 编辑:冷猫、杜伟 前些天,DeepSeek 在发布 DeepSeek V3.1 的文章评论区中,提及了 UE8M0 FP8 的量化设计,声称是针对即将发布的下一代国产芯片设计。 这件事一下引发了巨大反响,不仅是关于新一代国产芯片设计、大模型在国产芯片训练的话题,也顺势引发了大家对大模型量化策略的关注。 FP8,其全称为 8-bit floating point(8 位浮点数),是一种 超低精度 的数据表示格式,相较于 FP32(单精度)或 FP16(半精度)等传统浮点格式,FP8 可以在尽 量保持数值稳定性和模型精度的前提下,进一步降低存储和计算开销(参见机器之心文章: 用FP8训练大模型有多香?微软:比BF16快64%,省42%内存 )。 在英伟达之外,微软、Meta、英特尔、AMD 等也都在研究 FP8 训练与推理,有成为业界「新黄金标准」的趋势。 如今,DeepSeek 采用非主流的 FP8 量化策略,隐隐展现出国产大模型与国产芯片芯片软硬结合的优化策略与英伟达的高兼容策略的不同发展路径。 UE8M0 FP8 具有鲜明的战略意义。DeepSeek 选择在模型端率先采用并公开声明使用 UE8 ...
推理成本骤降75%!gpt-oss用新数据类型实现4倍推理速度,80GB显卡能跑1200亿参数大模型
量子位· 2025-08-11 07:48
Core Insights - OpenAI's latest open-source model gpt-oss utilizes the MXFP4 data type, resulting in a 75% reduction in inference costs and a fourfold increase in token generation speed [1][5][4] Group 1: Cost Reduction and Performance Improvement - The MXFP4 data type allows a 120 billion parameter model to fit into an 80GB GPU, and even a 16GB GPU can run a 20 billion parameter version [2] - MXFP4 compresses memory usage to one-fourth of the equivalent BF16 model while significantly enhancing token generation speed [5][4] - The MXFP4 quantization is applied to approximately 90% of the weights in gpt-oss, primarily aimed at reducing operational costs [4][5] Group 2: Technical Mechanism - The operational cost of models is influenced by weight storage and memory bandwidth, with data type changes directly affecting these factors [7][10] - Traditional models use FP32 for weight storage, consuming 4 bytes per parameter, while MXFP4 reduces this to half a byte, leading to an 87.5% reduction in weight storage size [11][12] - This compression not only decreases storage space but also allows for faster data read/write operations, enhancing inference speed [13][14] Group 3: MXFP4 Characteristics - MXFP4 stands for Micro-scaling Floating Point 4-bit, defined by the Open Compute Project (OCP) [15] - MXFP4 maintains a balance between data size reduction and precision by using a scaling factor for groups of high-precision values [20][22] - The performance of chips can double with each halving of floating-point precision, significantly improving throughput during inference [24] Group 4: Industry Implications - OpenAI's adoption of MXFP4 suggests its adequacy for broader applications, potentially influencing industry standards [34][35] - The MXFP4 data type is not new, having been discussed in OCP reports, but its practical application in large language models is gaining traction [28] - While MXFP4 is an improvement over standard FP4, it may still face quality issues compared to FP8, prompting alternatives like NVFP4 from Nvidia [32][33]