Core Insights - The latest model GLM-4.6 from Zhiyu, one of the "Six Little Dragons" of domestic large models, has been released, showcasing improvements in programming, long context handling, reasoning ability, information retrieval, writing skills, and intelligent applications [1] Model Enhancements - GLM-4.6 aligns its coding capabilities with Claude Sonnet 4 in public benchmarks and real programming tasks [4] - The context window has been increased from 128K to 200K, allowing for longer code and intelligent agent tasks [4] - The new model enhances reasoning capabilities and supports tool invocation during reasoning processes [4] - The model's tool invocation and search intelligence have been improved [4] Chip Integration and Cost Efficiency - A key focus of the new model is "module core linkage," with GLM-4.6 achieving FP8+Int4 mixed-precision deployment on domestic Cambrian chips, marking the first industry implementation of this model on domestic chips [4] - This mixed-precision approach reduces inference costs while maintaining accuracy, exploring feasible paths for localized operation of large models on domestic chips [4] - FP8 (Floating-Point 8) offers a wide dynamic range with minimal precision loss, while Int4 (Integer 4) provides high compression ratios with low memory usage but relatively higher precision loss [4] Memory Optimization - Core parameters of the large model, which account for 60%-80% of total memory, can be compressed to 1/4 of FP16 size through Int4 quantization, significantly reducing the memory pressure on chip graphics [5] - Temporary dialogue data accumulated during inference can be compressed using Int4 while keeping precision loss to a "slight" level [5] - FP8 is utilized for numerically sensitive modules to minimize precision loss and retain fine semantic information [5] Ecosystem Development - The adaptation of GLM-4.6 by Cambrian and Moore Threads signifies that domestic GPUs are capable of collaborating and iterating with cutting-edge large models, accelerating the construction of a self-controlled AI technology ecosystem [6] - The combination of GLM-4.6 and domestic chips will first be offered to enterprises and the public through the Zhiyu MaaS platform [6]
智谱发布GLM-4.6,寒武纪,摩尔线程完成适配