Group 1 - LongCat-Flash utilizes an innovative Mixture-of-Experts (MoE) architecture with a total parameter count of 560 billion, activating between 18.6 billion to 31.3 billion parameters, averaging 27 billion, optimizing both computational efficiency and performance [2][4] - LongCat-Flash-Chat demonstrates performance comparable to leading mainstream models while activating only a small number of parameters, particularly excelling in agentic tasks [2] - The model features a Zero-Computation Experts mechanism, allowing for on-demand computational resource allocation and efficient utilization [4] Group 2 - LongCat-Flash incorporates inter-layer channels to enhance parallel communication and computation, significantly improving training and inference efficiency [5] - The model achieved a user inference speed of over 100 tokens per second on H800 within 30 days of efficient training [5] - LongCat-Flash's system optimization allows for a generation speed of 100 tokens per second while maintaining a low output cost of 5 yuan per million tokens [7] Group 3 - The model has been optimized throughout the training process, including the use of multi-agent methods to generate diverse and high-quality trajectory data, resulting in superior agentic capabilities [7] - LongCat-Flash's design combines algorithmic and engineering aspects, leading to significant cost and speed advantages over similarly scaled or smaller models in the industry [7]
美团“Building LLM ”进展首度曝光:发布并开源LongCat