Workflow
美团“Building LLM ”进展首度曝光:发布并开源LongCat-Flash-Chat 输出成本低至5元/百万token
Huan Qiu Wang·2025-09-01 03:49

Group 1 - LongCat-Flash utilizes an innovative Mixture-of-Experts (MoE) architecture with a total parameter count of 560 billion, activating between 18.6 billion to 31.3 billion parameters, averaging 27 billion, optimizing both computational efficiency and performance [2][4] - LongCat-Flash-Chat demonstrates performance comparable to leading mainstream models while activating only a small number of parameters, particularly excelling in agentic tasks [2] - The model features a Zero-Computation Experts mechanism, allowing for on-demand computational resource allocation and efficient utilization [4] Group 2 - LongCat-Flash incorporates inter-layer channels to enhance parallel communication and computation, significantly improving training and inference efficiency [5] - The model achieved a user inference speed of over 100 tokens per second on the H800 platform within 30 days of efficient training [5] - LongCat-Flash's system optimization allows for a generation speed of 100 tokens per second while maintaining a low output cost of 5 yuan per million tokens [7] Group 3 - The company has made significant advancements in AI this year, launching multiple AI applications including AI Coding Agent NoCode and AI business decision assistant [4] - LongCat-Flash has undergone comprehensive optimization throughout the training process, utilizing multi-agent methods to generate diverse high-quality trajectory data [7] - The AI strategy of the company is built on three levels: AI at work, AI in products, and Building LLM, with the open-sourcing of the model marking a significant milestone in its Building LLM progress [4]