摩尔线程新一代GPU
Search documents
智谱发布GLM-4.6,联手寒武纪,摩尔线程推出模型芯片一体解决方案
Guan Cha Zhe Wang· 2025-10-01 01:37
Core Insights - The latest model GLM-4.6 from Zhiyu, part of the domestic large model "Six Little Dragons," has been released, showcasing improvements in programming, long context handling, reasoning capabilities, information retrieval, writing skills, and agent applications [1] Group 1: Model Enhancements - GLM-4.6 demonstrates enhanced coding capabilities aligning with Claude Sonnet4 in public benchmarks and real programming tasks [4] - The context window has been increased from 128K to 200K, allowing for longer code and intelligent agent tasks [4] - The new model improves reasoning abilities and supports tool invocation during reasoning processes [4] Group 2: Technological Innovations - The "MoCore linkage" is a key focus of the new model, with GLM-4.6 achieving FP8+Int4 mixed-precision deployment on domestic Cambricon chips, marking the industry's first production of an FP8+Int4 model chip solution on domestic hardware [4] - FP8 (Floating-Point8) offers a wide dynamic range with minimal precision loss, while Int4 (Integer4) provides high compression ratios with low memory usage but more significant precision loss [4][5] Group 3: Resource Optimization - The mixed FP8+Int4 mode allocates quantization formats based on the functional differences of the model's modules, optimizing memory usage [5] - Core parameters, which account for 60%-80% of the total memory, can be compressed to 1/4 of FP16 size through Int4 quantization, significantly reducing chip memory pressure [5] - Temporary dialogue data accumulated during inference can be compressed using Int4 while keeping precision loss to a "slight" level [5] Group 4: Industry Collaboration - Moer Thread has completed adaptation of GLM-4.6 based on the vLLM inference framework, demonstrating the advantages of the MUSA architecture and full-function GPU in ecological compatibility and rapid adaptation [5] - The collaboration between Cambricon and Moer Thread signifies that domestic GPUs are now capable of iterating with cutting-edge large models, accelerating the establishment of a self-controlled AI technology ecosystem [5] - GLM-4.6, combined with domestic chips, will first be offered to enterprises and the public through the Zhiyu MaaS platform [5]
智谱发布GLM-4.6,寒武纪,摩尔线程完成适配
Guan Cha Zhe Wang· 2025-10-01 01:36
Core Insights - The latest model GLM-4.6 from Zhiyu, one of the "Six Little Dragons" of domestic large models, has been released, showcasing improvements in programming, long context handling, reasoning ability, information retrieval, writing skills, and intelligent applications [1] Model Enhancements - GLM-4.6 aligns its coding capabilities with Claude Sonnet 4 in public benchmarks and real programming tasks [4] - The context window has been increased from 128K to 200K, allowing for longer code and intelligent agent tasks [4] - The new model enhances reasoning capabilities and supports tool invocation during reasoning processes [4] - The model's tool invocation and search intelligence have been improved [4] Chip Integration and Cost Efficiency - A key focus of the new model is "module core linkage," with GLM-4.6 achieving FP8+Int4 mixed-precision deployment on domestic Cambrian chips, marking the first industry implementation of this model on domestic chips [4] - This mixed-precision approach reduces inference costs while maintaining accuracy, exploring feasible paths for localized operation of large models on domestic chips [4] - FP8 (Floating-Point 8) offers a wide dynamic range with minimal precision loss, while Int4 (Integer 4) provides high compression ratios with low memory usage but relatively higher precision loss [4] Memory Optimization - Core parameters of the large model, which account for 60%-80% of total memory, can be compressed to 1/4 of FP16 size through Int4 quantization, significantly reducing the memory pressure on chip graphics [5] - Temporary dialogue data accumulated during inference can be compressed using Int4 while keeping precision loss to a "slight" level [5] - FP8 is utilized for numerically sensitive modules to minimize precision loss and retain fine semantic information [5] Ecosystem Development - The adaptation of GLM-4.6 by Cambrian and Moore Threads signifies that domestic GPUs are capable of collaborating and iterating with cutting-edge large models, accelerating the construction of a self-controlled AI technology ecosystem [6] - The combination of GLM-4.6 and domestic chips will first be offered to enterprises and the public through the Zhiyu MaaS platform [6]
智谱正式发布并开源新一代大模型GLM-4.6 寒武纪、摩尔线程完成对智谱GLM-4.6的适配
Zheng Quan Shi Bao Wang· 2025-09-30 07:58
Core Insights - The release of GLM-4.6 by Zhipu marks a significant advancement in large model capabilities, particularly in Agentic Coding and other core functionalities [1] - GLM-4.6 has achieved comprehensive alignment with Claude Sonnet4 in code generation, establishing itself as the strongest coding model in China [1] - The model has undergone extensive upgrades in long context processing, reasoning, information retrieval, text generation, and agent applications, surpassing the performance of DeepSeek-V3.2-Exp [1] - As an open-source model, GLM-4.6 is one of the strongest general-purpose large models globally, enhancing the position of domestic large models in the global competitive landscape [1] Technological Developments - Zhipu has implemented FP8+Int4 mixed-precision quantization inference deployment on leading domestic AI chips from Cambrian, marking the first production of an FP8+Int4 model-chip integrated solution on domestic chips [1] - This solution significantly reduces inference costs while maintaining model accuracy, providing a feasible path for local operation of large models on domestic chips [1] - Moore Threads has adapted GLM-4.6 based on the vLLM inference framework, demonstrating the advantages of the MUSA architecture and full-featured GPUs in ecological compatibility and rapid adaptation [2] Industry Implications - The collaboration between Cambrian and Moore Threads signifies that domestic GPUs are now capable of iterating in synergy with cutting-edge large models, accelerating the construction of a self-controlled AI technology ecosystem [2] - The combination of GLM-4.6 and domestic chips will initially be offered to enterprises and the public through the Zhipu MaaS platform, unlocking broader social and industrial value [2] - The deep collaboration between domestically developed GLM series large models and domestic chips will continue to drive dual optimization of performance and efficiency in model training and inference, fostering a more open, controllable, and efficient AI infrastructure [2]
智谱发布GLM-4.6 寒武纪、摩尔线程已适配
Mei Ri Jing Ji Xin Wen· 2025-09-30 07:47
Core Insights - The domestic large model key enterprise, Zhipu, has officially released and open-sourced its next-generation large model GLM-4.6, achieving significant advancements in core capabilities such as Agentic Coding [1] - This release follows the major technology launches of DeepSeek-V3.2-Exp and Claude Sonnet4.5, marking another significant development in the industry before the National Day holiday [1] - Zhipu announced that GLM-4.6 has been deployed on leading domestic AI chips from Cambrian using FP8+Int4 mixed-precision quantization inference, representing the first production of an FP8+Int4 model-chip integrated solution on domestic chips [1] - Additionally, Moore Threads has completed the adaptation of GLM-4.6 based on the vLLM inference framework, allowing the new generation of GPUs to stably run the model at native FP8 precision [1]
智谱联手寒武纪,推出模型芯片一体解决方案
Di Yi Cai Jing· 2025-09-30 07:38
Core Insights - The latest model GLM-4.6 from the domestic AI startup Zhipu has been released, showcasing improvements in programming, long context handling, reasoning capabilities, information retrieval, writing skills, and agent applications [3] Model Enhancements - GLM-4.6 aligns its coding capabilities with Claude Sonnet 4 in public benchmarks and real programming tasks [3] - The context window has been increased from 128K to 200K, allowing for longer code and agent tasks [3] - The new model enhances reasoning abilities and supports tool invocation during reasoning processes [3] - There is an improvement in the model's tool invocation and search capabilities [3] Chip Integration - The "MoCore linkage" is a key focus of the new model, with GLM-4.6 achieving FP8+Int4 mixed quantization deployment on domestic Cambricon chips, marking the industry's first production of an FP8+Int4 model chip solution on domestic hardware [3] - This approach maintains accuracy while reducing inference costs, exploring feasible paths for localized operation of large models on domestic chips [3] Quantization Techniques - FP8 (Floating-Point 8) offers a wide dynamic range with minimal precision loss, while Int4 (Integer 4) provides high compression ratios with lower memory usage but more noticeable precision loss [4] - The "FP8+Int4 mixed" mode allocates quantization formats based on the functional differences of the model's modules, optimizing memory usage [4] Memory Efficiency - Core parameters of the large model, which account for 60%-80% of total memory, can be compressed to 1/4 of FP16 size through Int4 quantization, significantly reducing the memory pressure on chips [5] - Temporary dialogue data accumulated during inference can also be compressed using Int4 while keeping precision loss minimal [5] - FP8 is used for numerically sensitive modules to minimize precision loss and retain fine semantic information [5] Ecosystem Development - Cambricon and Moore Threads have successfully adapted GLM-4.6 based on the vLLM inference framework, demonstrating the capabilities of the new generation of GPUs to run the model stably at native FP8 precision [5] - This adaptation signifies that domestic GPUs are now capable of collaborating and iterating with cutting-edge large models, accelerating the development of a self-controlled AI technology ecosystem [5] - The combination of GLM-4.6 and domestic chips will be offered to enterprises and the public through the Zhipu MaaS platform [5]
智谱旗舰模型GLM-4.6上线 寒武纪、摩尔线程已完成适配
Hua Er Jie Jian Wen· 2025-09-30 07:13
Core Insights - The latest GLM-4.6 model has been launched, showcasing a 27% improvement in coding capabilities compared to its predecessor GLM-4.5, excelling in real programming, long context handling, and reasoning abilities [1] - GLM-4.6 achieved the highest domestic standard in public benchmark tests and surpassed other domestic models in 74 real programming tasks [1] - The model has been deployed on leading domestic AI chips from Cambrian using FP8+Int4 mixed-precision quantization, marking the first production of an FP8+Int4 model integrated with chip solutions on domestic chips [1] - Additionally, Moore Threads has adapted GLM-4.6 to the vLLM inference framework, enabling the new generation of GPUs to run the model stably at native FP8 precision [1]