华为推出软工代码智能体SWE-Lego，解锁SFT训练极致性能

Core Insights - The article discusses the introduction of SWE-Lego by Huawei's research team, a software engineering code model based solely on supervised fine-tuning (SFT), which achieves state-of-the-art (SOTA) performance without the complexities of reinforcement learning (RL) [2][5][43]. Group 1: Challenges and Motivation - Software engineering tasks require complex capabilities such as long-sequence reasoning, multi-file operations, and tool usage, which existing training methods struggle to address due to high computational costs and data scarcity [4][9]. - Traditional methods often involve complex training paradigms, including RL, which increases training complexity and costs, making it challenging for smaller teams [5][9]. Group 2: Three Core Components of SWE-Lego - Hybrid Dataset Construction: SWE-Lego's dataset comprises 32,119 high-quality task instances and 18,110 validation trajectories, utilizing a mix of real-world data from GitHub pull requests and synthetic data generated by introducing bugs into code [14][17]. - Improved Supervised Fine-Tuning: SWE-Lego employs two key improvements: step-level error masking, which allows the model to learn only from correct steps, and difficulty-based curriculum learning, which gradually increases task complexity [26][28]. - Testing Time Extension (TTS): TTS enhances performance during testing by allocating additional computational resources, with a focus on serial versus parallel expansion strategies and the use of generative scoring over regression scoring [34][40]. Group 3: Performance Metrics and Results - SWE-Lego-Qwen3-8B and SWE-Lego-Qwen3-32B achieved performance scores of 42.2% and 52.6% respectively, surpassing many larger closed-source models [5][13]. - The hybrid dataset contributed the most to performance improvement, accounting for a 25.6% increase, while the improved SFT and TTS contributed 3.8% and 6.2% respectively, leading to a total performance increase of 35.6 percentage points [13][25]. Group 4: Future Directions - The article concludes that SWE-Lego demonstrates that lightweight methods can achieve SOTA performance without complex RL or iterative training, emphasizing the importance of data quality and strict validation [43]. Future explorations will focus on larger models, additional programming languages, and real-world software development processes [43].