重思考模式
Search documents
8次反复检查,美团上线开源并可体验的“重思考”模型
Xin Jing Bao· 2026-01-16 13:18
Core Insights - Meituan's LongCat team has released an upgraded open-source model, LongCat-Flash-Thinking-2601, which achieves state-of-the-art performance in key evaluation benchmarks such as Agentic Search and Tool Use [1][3] - The new model demonstrates significant advantages in tool usage generalization, outperforming Claude-Opus-4.5-Thinking in complex tasks that rely on tool invocation, thereby reducing training costs for adapting to new tools in real-world scenarios [1][3] - The model supports a "rethink" mode, allowing it to activate eight independent "thinkers" to execute tasks simultaneously, enhancing the depth of analysis [1][2] Model Performance and Analysis - In a test scenario regarding the winter of 2010, the model provided multiple analyses, ultimately concluding that it was a "warm early winter, cold mid-winter" due to the influence of a strong La Niña event, despite not strictly meeting the cold winter criteria [2] - The system's analysis of the reasons behind the failure of Smartisan Technology highlighted issues such as internal turmoil, lack of management experience, funding difficulties, and an overemphasis on design and marketing at the expense of supply chain management [2] Technical Approach - The LongCat team has developed a diverse and high-intensity training environment for the model, integrating over 60 tools to create complex interdependencies, which enhances the model's generalization capabilities in unknown scenarios [3][4] - The training infrastructure has been expanded to support stable parallel training of large-scale multi-environment agents, maximizing training efficiency and resource utilization by intelligently distributing computational power based on task difficulty and training progress [3] - To address real-world uncertainties, the team has incorporated various types of noise into the training data, simulating API failures and incomplete data scenarios, thereby improving the model's decision-making under adverse conditions [4]
美团又上新模型,8个Thinker齐开工,能顶个诸葛亮?
机器之心· 2026-01-16 08:13
Core Insights - The article discusses the latest advancements in AI models, specifically focusing on Meituan's LongCat-Flash-Thinking-2601, which features 560 billion parameters and is built on an innovative MoE architecture [1][41][62] - The model introduces a Heavy Thinking Mode that allows for simultaneous multi-path reasoning, enhancing the reliability and comprehensiveness of conclusions [4][48][62] - LongCat-Flash-Thinking-2601 demonstrates significant improvements in agent capabilities, achieving top performance in various benchmark tests and showing enhanced generalization in out-of-distribution (OOD) scenarios [6][62] Model Features - LongCat-Flash-Thinking-2601 employs a Heavy Thinking Mode that activates eight independent thinkers to explore different reasoning paths, thereby reducing errors and improving answer quality [4][48][50] - The model's architecture supports parallel thinking and iterative summarization, allowing for a broader and deeper exploration of complex problems [41][50] - A new evaluation method for agent model generalization has been introduced, which generates complex tasks based on given keywords, enhancing the model's adaptability to unknown scenarios [8][10][11] Performance Testing - Real-world testing of the model showed its capability in logical reasoning tasks, where it effectively utilized the Heavy Thinking Mode to arrive at reliable answers through collaborative reasoning [12][15][16] - The model's programming abilities were tested by generating games like Flappy Bird and Conway's Game of Life, showcasing its versatility despite the high computational cost of using multiple thinkers [26][32][32] - In a comparative analysis with Claude 4.5 Opus, LongCat-Flash-Thinking-2601 achieved a 100% standard coverage rate, outperforming its competitor in handling complex tool dependencies [38][62] Technological Innovations - The model incorporates advanced techniques such as environment scaling and multi-environment reinforcement learning, which enhance its training and performance in diverse scenarios [41][51][53] - LongCat's training process includes the introduction of noise to improve robustness, allowing the model to perform well in real-world conditions that are often imperfect [60][62] - The upcoming LongCat ZigZag Attention mechanism aims to support a context of up to 1 million tokens, further expanding the model's capabilities [63] Development Timeline - Meituan's AI model development has been rapid, with consistent updates since its initial launch in September 2025, focusing on enhancing response speed, logical reasoning, and multi-modal capabilities [65][67] - The company aims to create a model that can effectively solve real-world problems, aspiring towards a future where "model as a service" becomes a reality [68]