摩尔线程新一代GPU

Search documents
智谱发布GLM-4.6,联手寒武纪,摩尔线程推出模型芯片一体解决方案
Guan Cha Zhe Wang· 2025-10-01 01:37
9月30日,国产大模型"六小龙"之一的智谱发布GLM-4.6新模型。 作为GLM系列最新版本,GLM-4.6在真实编程、长上下文处理、推理能力、信息搜索、写作能力与智能体应用等多个方面能力有所提升。 官方信息显示,此次升级表现在公开基准与真实编程任务中,GLM-4.6代码能力对齐Claude Sonnet4;上下文窗口由128K提升至200K,适应更长的代码和智 能体任务;新模型提升推理能力,并支持在推理过程中调用工具;搜索方面增强模型的工具调用和搜索智能体。 另外,"模芯联动"是此次新模型发布的重点,GLM-4.6已在寒武纪国产芯片上实现FP8+Int4混合量化部署,这也是行业首次在国产芯片上投产的FP8+Int4模 型芯片一体解决方案,在保持精度不变的前提下,降低推理成本,为国产芯片在大模型本地化运行上探索可行路径。 具体到模型适配过程中,占总内存的60%-80%的大模型核心参数通过Int4量化后,可将权重体积直接压缩为FP16的1/4,大幅降低芯片显存的占用压力;推理 环节积累的临时对话数据可以通过Int4压缩内存的同时,将精度损失控制在"轻微"范围。而FP8可重点针对模型中"数值敏感、影响推理准确性" ...
智谱发布GLM-4.6,寒武纪,摩尔线程完成适配
Guan Cha Zhe Wang· 2025-10-01 01:36
官方信息显示,此次升级表现在公开基准与真实编程任务中,GLM-4.6代码能力对齐Claude Sonnet 4; 上下文窗口由128K提升至200K,适应更长的代码和智能体任务;新模型提升推理能力,并支持在推理 过程中调用工具;搜索方面增强模型的工具调用和搜索智能体。 另外,"模芯联动"是此次新模型发布的重点,GLM-4.6已在寒武纪国产芯片上实现FP8+Int4混合量化部 署,这也是行业首次在国产芯片上投产的FP8+Int4模型芯片一体解决方案,在保持精度不变的前提下, 降低推理成本,为国产芯片在大模型本地化运行上探索可行路径。 FP8是8位浮点数(Floating-Point 8)数据类型,动态范围广、精度损失小;Int4是4 位整数(Integer 4) 数据类型,压缩比极高,内存占用最少,适配低算力硬件但精度损失相对明显。此次尝试的"FP8+Int4 混合" 模式,并非简单将两种格式叠加,而是根据大模型的"模块功能差异",针对性分配量化格式,让 该省内存的地方用Int4压到极致,该保精度的地方用FP8守住底线,实现合理资源分配。 具体到模型适配过程中,占总内存的60%-80%的大模型核心参数通过Int ...
智谱正式发布并开源新一代大模型GLM-4.6 寒武纪、摩尔线程完成对智谱GLM-4.6的适配
Zheng Quan Shi Bao Wang· 2025-09-30 07:58
寒武纪与摩尔线程完成对GLM-4.6的适配,标志着国产GPU已具备与前沿大模型协同迭代的能力,加速 构建自主可控的AI技术生态。GLM-4.6搭配国产芯片的组合将率先通过智谱MaaS平台面向企业与公众 提供服务,释放更广泛的社会与产业价值。 未来,国产原创的GLM系列大模型与国产芯片的深度协同,将在模型训练和推理环节持续推动性能与 效率的双重优化,构建更加开放、可控、高效的人工智能基础设施。 智谱官方宣布,GLM-4.6已在寒武纪领先的国产AI芯片上实现FP8+Int4混合量化推理部署,这也是首次 在国产芯片上投产的FP8+Int4模型-芯片一体化解决方案。在保持模型精度不变的前提下,该方案大幅 降低了推理成本,为国产芯片本地化运行大模型提供了可行路径和示范意义。 与此同时,摩尔线程基于vLLM推理框架完成了对GLM-4.6的适配,新一代GPU可在原生FP8精度下稳 定运行模型,充分验证了MUSA架构及全功能GPU在生态兼容性和快速适配能力方面的优势。 9月30日,国内大模型领军企业智谱正式发布并开源新一代大模型GLM-4.6,在Agentic Coding等核心能 力上实现大幅跃升。这是继DeepSeek- ...
智谱发布GLM-4.6 寒武纪、摩尔线程已适配
Mei Ri Jing Ji Xin Wen· 2025-09-30 07:47
Core Insights - The domestic large model key enterprise, Zhipu, has officially released and open-sourced its next-generation large model GLM-4.6, achieving significant advancements in core capabilities such as Agentic Coding [1] - This release follows the major technology launches of DeepSeek-V3.2-Exp and Claude Sonnet4.5, marking another significant development in the industry before the National Day holiday [1] - Zhipu announced that GLM-4.6 has been deployed on leading domestic AI chips from Cambrian using FP8+Int4 mixed-precision quantization inference, representing the first production of an FP8+Int4 model-chip integrated solution on domestic chips [1] - Additionally, Moore Threads has completed the adaptation of GLM-4.6 based on the vLLM inference framework, allowing the new generation of GPUs to stably run the model at native FP8 precision [1]
智谱联手寒武纪,推出模型芯片一体解决方案
Di Yi Cai Jing· 2025-09-30 07:38
Core Insights - The latest model GLM-4.6 from the domestic AI startup Zhipu has been released, showcasing improvements in programming, long context handling, reasoning capabilities, information retrieval, writing skills, and agent applications [3] Model Enhancements - GLM-4.6 aligns its coding capabilities with Claude Sonnet 4 in public benchmarks and real programming tasks [3] - The context window has been increased from 128K to 200K, allowing for longer code and agent tasks [3] - The new model enhances reasoning abilities and supports tool invocation during reasoning processes [3] - There is an improvement in the model's tool invocation and search capabilities [3] Chip Integration - The "MoCore linkage" is a key focus of the new model, with GLM-4.6 achieving FP8+Int4 mixed quantization deployment on domestic Cambricon chips, marking the industry's first production of an FP8+Int4 model chip solution on domestic hardware [3] - This approach maintains accuracy while reducing inference costs, exploring feasible paths for localized operation of large models on domestic chips [3] Quantization Techniques - FP8 (Floating-Point 8) offers a wide dynamic range with minimal precision loss, while Int4 (Integer 4) provides high compression ratios with lower memory usage but more noticeable precision loss [4] - The "FP8+Int4 mixed" mode allocates quantization formats based on the functional differences of the model's modules, optimizing memory usage [4] Memory Efficiency - Core parameters of the large model, which account for 60%-80% of total memory, can be compressed to 1/4 of FP16 size through Int4 quantization, significantly reducing the memory pressure on chips [5] - Temporary dialogue data accumulated during inference can also be compressed using Int4 while keeping precision loss minimal [5] - FP8 is used for numerically sensitive modules to minimize precision loss and retain fine semantic information [5] Ecosystem Development - Cambricon and Moore Threads have successfully adapted GLM-4.6 based on the vLLM inference framework, demonstrating the capabilities of the new generation of GPUs to run the model stably at native FP8 precision [5] - This adaptation signifies that domestic GPUs are now capable of collaborating and iterating with cutting-edge large models, accelerating the development of a self-controlled AI technology ecosystem [5] - The combination of GLM-4.6 and domestic chips will be offered to enterprises and the public through the Zhipu MaaS platform [5]
智谱旗舰模型GLM-4.6上线 寒武纪、摩尔线程已完成适配
Hua Er Jie Jian Wen· 2025-09-30 07:13
风险提示及免责条款 据智谱消息,最新的GLM-4.6模型上线,其代码能力比前代GLM-4.5提升27%,在真实编程、长上下文 处理、推理能力等多方面表现优异。GLM-4.6在公开基准测试中达到国内最高水准,并在74个真实编程 任务中超越其他国产模型。智谱官方宣布,GLM-4.6已在寒武纪领先的国产AI芯片上实现FP8+Int4混合 量化推理部署,这也是首次在国产芯片上投产的FP8+Int4模型-芯片一体化解决方案。与此同时,摩尔 线程基于vLLM推理框架完成对GLM-4.6的适配,新一代GPU可在原生FP8精度下稳定运行模型。 市场有风险,投资需谨慎。本文不构成个人投资建议,也未考虑到个别用户特殊的投资目标、财务状况或需要。用户应考虑本文中的任何 意见、观点或结论是否符合其特定状况。据此投资,责任自负。 ...