Workflow
slime
icon
Search documents
智谱终于发布GLM-4.5技术报告,从预训练到后训练,细节大公开
机器之心· 2025-08-11 07:12
Core Viewpoint - The article highlights the release of GLM-4.5 and GLM-4.5-Air, which integrate reasoning, coding, and agentic capabilities into a single model, achieving the highest ranking among domestic and open-source models in 12 global benchmarks [2][11][19]. Group 1: Model Performance and Reception - GLM-4.5 achieved third place in global rankings across 12 recognized benchmarks, outperforming all domestic and open-source models [2][19]. - The model's announcement generated significant attention, with over 1.2 million views on social media and topping the Hugging Face trends for seven consecutive days [2][3]. - The technical report for GLM-4.5 was voted as the "1 Paper of the day" by Hugging Face users [13]. Group 2: Technical Innovations - GLM-4.5 employs a MoE (Mixture of Experts) architecture, enhancing computational efficiency during training and inference [21][24]. - The model features a unique training process, including pre-training on 15 trillion tokens and mid-training on 7 trillion tokens, with a maximum sequence length expanded from 4K to 128K [25][27]. - The introduction of the slime framework supports efficient reinforcement learning training, addressing common bottlenecks in agentic tasks [31][34]. Group 3: Key Capabilities - GLM-4.5 integrates three core capabilities: agentic ability for real-world interaction, complex reasoning for multi-step problem-solving, and advanced coding skills for software engineering tasks [22][19]. - The model's performance in agentic tasks was evaluated against competitors, showing superior results in benchmarks like TAU-bench and BFCL V3 [44]. - In reasoning tasks, GLM-4.5 outperformed OpenAI's models in several benchmarks, including AIME 24 and SciCode [47][50]. Group 4: Code Task Performance - GLM-4.5 excelled in code-related benchmarks, outperforming GPT-4.1 and Claude Sonnet 4 in SWE-bench Verified and Terminal-Bench [52][53]. - The model's overall performance in coding tasks positions it as a strong competitor to Claude Sonnet 4 [53]. Group 5: Future Implications - The release of the technical report provides insights into the development direction for domestic open-source large models, serving as a key reference for future research [56][57].