Core Insights - The article discusses the challenges faced by large models in long-text reasoning, highlighting issues such as false prosperity in performance metrics and difficulties in multi-hop reasoning tasks [2][3] - It introduces QwenLong-L1.5, a new model designed to address these challenges through a comprehensive post-training framework that includes data synthesis, reinforcement learning optimization, and memory management [4][32] Group 1: Challenges in Long-Text Reasoning - Models often achieve high scores in simple tasks but struggle with complex multi-hop reasoning, revealing limitations in deep understanding [2] - The training data for long-text tasks is complex and heterogeneous, leading to instability in reinforcement learning algorithms and potential performance degradation [14][16] - The physical memory limitations of models restrict their ability to process extensive knowledge, necessitating compromises that can result in loss of critical information [3] Group 2: QwenLong-L1.5 Model Features - QwenLong-L1.5 is built on the Qwen3-30B-A3B architecture and aims to provide a systematic solution to long-text reasoning challenges [4] - The model incorporates a high-quality data synthesis pipeline that generates multi-hop reasoning tasks, enhancing the model's ability to think critically [9] - It employs a stable and efficient reinforcement learning strategy to address challenges such as distributional drift and credit assignment problems [12][17] Group 3: Performance Improvements - QwenLong-L1.5 has shown significant performance improvements, achieving an average score increase of 9.9 points compared to its predecessor [26] - The model's enhancements are particularly evident in complex reasoning tasks, with notable performance gains in benchmarks like MRCR and CorpusQA [26][27] - It demonstrates superior capabilities in handling ultra-long tasks, showcasing its potential to process information beyond traditional memory limits [28][29] Group 4: Conclusion and Open Source - The article concludes that the combination of data synthesis, reinforcement learning optimization, and memory management in QwenLong-L1.5 provides a validated path for addressing long-text reasoning challenges [32] - The company encourages open collaboration and sharing of the technology, with relevant details available in the published paper and on GitHub [32]
QwenLong-L1.5发布:一套配方,三大法宝,让30B MoE模型长文本推理能力媲美GPT-5
机器之心·2025-12-29 04:44