8次反复检查,美团上线开源并可体验的“重思考”模型
Xin Jing Bao·2026-01-16 13:18

Core Insights - Meituan's LongCat team has released an upgraded open-source model, LongCat-Flash-Thinking-2601, which achieves state-of-the-art performance in key evaluation benchmarks such as Agentic Search and Tool Use [1][3] - The new model demonstrates significant advantages in tool usage generalization, outperforming Claude-Opus-4.5-Thinking in complex tasks that rely on tool invocation, thereby reducing training costs for adapting to new tools in real-world scenarios [1][3] - The model supports a "rethink" mode, allowing it to activate eight independent "thinkers" to execute tasks simultaneously, enhancing the depth of analysis [1][2] Model Performance and Analysis - In a test scenario regarding the winter of 2010, the model provided multiple analyses, ultimately concluding that it was a "warm early winter, cold mid-winter" due to the influence of a strong La Niña event, despite not strictly meeting the cold winter criteria [2] - The system's analysis of the reasons behind the failure of Smartisan Technology highlighted issues such as internal turmoil, lack of management experience, funding difficulties, and an overemphasis on design and marketing at the expense of supply chain management [2] Technical Approach - The LongCat team has developed a diverse and high-intensity training environment for the model, integrating over 60 tools to create complex interdependencies, which enhances the model's generalization capabilities in unknown scenarios [3][4] - The training infrastructure has been expanded to support stable parallel training of large-scale multi-environment agents, maximizing training efficiency and resource utilization by intelligently distributing computational power based on task difficulty and training progress [3] - To address real-world uncertainties, the team has incorporated various types of noise into the training data, simulating API failures and incomplete data scenarios, thereby improving the model's decision-making under adverse conditions [4]

MEITUAN-8次反复检查,美团上线开源并可体验的“重思考”模型 - Reportify