Cambricon-智谱发布GLM-4.6，联手寒武纪，摩尔线程推出模型芯片一体解决方案

Core Insights - The latest model GLM-4.6 from Zhiyu, part of the domestic large model "Six Little Dragons," has been released, showcasing improvements in programming, long context handling, reasoning capabilities, information retrieval, writing skills, and agent applications [1] Group 1: Model Enhancements - GLM-4.6 demonstrates enhanced coding capabilities aligning with Claude Sonnet4 in public benchmarks and real programming tasks [4] - The context window has been increased from 128K to 200K, allowing for longer code and intelligent agent tasks [4] - The new model improves reasoning abilities and supports tool invocation during reasoning processes [4] Group 2: Technological Innovations - The "MoCore linkage" is a key focus of the new model, with GLM-4.6 achieving FP8+Int4 mixed-precision deployment on domestic Cambricon chips, marking the industry's first production of an FP8+Int4 model chip solution on domestic hardware [4] - FP8 (Floating-Point8) offers a wide dynamic range with minimal precision loss, while Int4 (Integer4) provides high compression ratios with low memory usage but more significant precision loss [4][5] Group 3: Resource Optimization - The mixed FP8+Int4 mode allocates quantization formats based on the functional differences of the model's modules, optimizing memory usage [5] - Core parameters, which account for 60%-80% of the total memory, can be compressed to 1/4 of FP16 size through Int4 quantization, significantly reducing chip memory pressure [5] - Temporary dialogue data accumulated during inference can be compressed using Int4 while keeping precision loss to a "slight" level [5] Group 4: Industry Collaboration - Moer Thread has completed adaptation of GLM-4.6 based on the vLLM inference framework, demonstrating the advantages of the MUSA architecture and full-function GPU in ecological compatibility and rapid adaptation [5] - The collaboration between Cambricon and Moer Thread signifies that domestic GPUs are now capable of iterating with cutting-edge large models, accelerating the establishment of a self-controlled AI technology ecosystem [5] - GLM-4.6, combined with domestic chips, will first be offered to enterprises and the public through the Zhiyu MaaS platform [5]