Workflow
美团首个大模型被爆成功跑通国产化训练路径,可在国产加速卡上进行
Guan Cha Zhe Wang·2025-09-01 13:29

Core Insights - Meituan has officially launched LongCat-Flash-Chat, an open-source model available on Github and Hugging Face, marking a significant step in its AI strategy [1][3] - The model utilizes a novel Mixture-of-Experts (MoE) architecture with a total of 560 billion parameters, optimizing both computational efficiency and performance [1][3] - LongCat-Flash-Chat demonstrates superior performance in agent tasks while achieving faster inference speeds, making it suitable for complex applications [1][4] Group 1: Model Architecture and Performance - LongCat-Flash employs a "Zero-Computation Experts" mechanism, activating only 18.6 billion to 31.3 billion parameters per token based on contextual needs, thus ensuring efficient resource allocation [3][4] - The model's training process incorporates a PID controller for real-time adjustment of expert biases, stabilizing the average activation at approximately 27 billion parameters per token [3][4] - LongCat-Flash achieves a generation speed of over 100 tokens per second on the H800 platform, with a cost of only 5 yuan per million tokens [6] Group 2: Strategic Implications - Meituan's entry into the large model arena aligns with its AI strategy, which includes AI at work, AI in products, and building large language models (LLM) [3] - The launch of LongCat-Flash-Chat is part of Meituan's broader AI advancements, which have included various AI applications such as AI Coding Agent and AI decision-making assistants [3][4] - The model's design and training optimizations position it competitively against both larger and smaller models in the industry, highlighting Meituan's commitment to innovation in AI [6]