酒店经营的垂类 AI Agent 美团既白
Search documents
美团“Building LLM ”进展首度曝光:发布并开源LongCat
Huan Qiu Wang· 2025-09-01 05:07
Group 1 - LongCat-Flash utilizes an innovative Mixture-of-Experts (MoE) architecture with a total parameter count of 560 billion, activating between 18.6 billion to 31.3 billion parameters, averaging 27 billion, optimizing both computational efficiency and performance [2][4] - LongCat-Flash-Chat demonstrates performance comparable to leading mainstream models while activating only a small number of parameters, particularly excelling in agentic tasks [2] - The model features a Zero-Computation Experts mechanism, allowing for on-demand computational resource allocation and efficient utilization [4] Group 2 - LongCat-Flash incorporates inter-layer channels to enhance parallel communication and computation, significantly improving training and inference efficiency [5] - The model achieved a user inference speed of over 100 tokens per second on H800 within 30 days of efficient training [5] - LongCat-Flash's system optimization allows for a generation speed of 100 tokens per second while maintaining a low output cost of 5 yuan per million tokens [7] Group 3 - The model has been optimized throughout the training process, including the use of multi-agent methods to generate diverse and high-quality trajectory data, resulting in superior agentic capabilities [7] - LongCat-Flash's design combines algorithmic and engineering aspects, leading to significant cost and speed advantages over similarly scaled or smaller models in the industry [7]
美团“Building LLM ”进展首度曝光:发布并开源LongCat-Flash-Chat 输出成本低至5元/百万token
Huan Qiu Wang· 2025-09-01 03:49
Group 1 - LongCat-Flash utilizes an innovative Mixture-of-Experts (MoE) architecture with a total parameter count of 560 billion, activating between 18.6 billion to 31.3 billion parameters, averaging 27 billion, optimizing both computational efficiency and performance [2][4] - LongCat-Flash-Chat demonstrates performance comparable to leading mainstream models while activating only a small number of parameters, particularly excelling in agentic tasks [2] - The model features a Zero-Computation Experts mechanism, allowing for on-demand computational resource allocation and efficient utilization [4] Group 2 - LongCat-Flash incorporates inter-layer channels to enhance parallel communication and computation, significantly improving training and inference efficiency [5] - The model achieved a user inference speed of over 100 tokens per second on the H800 platform within 30 days of efficient training [5] - LongCat-Flash's system optimization allows for a generation speed of 100 tokens per second while maintaining a low output cost of 5 yuan per million tokens [7] Group 3 - The company has made significant advancements in AI this year, launching multiple AI applications including AI Coding Agent NoCode and AI business decision assistant [4] - LongCat-Flash has undergone comprehensive optimization throughout the training process, utilizing multi-agent methods to generate diverse high-quality trajectory data [7] - The AI strategy of the company is built on three levels: AI at work, AI in products, and Building LLM, with the open-sourcing of the model marking a significant milestone in its Building LLM progress [4]
美团发布并开源 LongCat-Flash-Chat,动态计算开启高效 AI 时代
Zhong Jin Zai Xian· 2025-09-01 02:28
▲美团发布并开源 LongCat-Flash-Chat(资料图) 据悉,LongCat-Flash 采用创新性混合专家模型(Mixture-of-Experts, MoE)架构,总参数 560B,激活参数 18.6B-31.3B(平均 27B),实现了计算效率与性能的双重优化。根据多项基准测试综合评估,作为一款非 思考型基础模型,LongCat-Flash-Chat 在仅激活少量参数的前提下,性能比肩当下领先的主流模型,尤 其在智能体任务中具备突出优势。此外,因为面向推理效率的设计和创新,LongCat-Flash-Chat 具有明 显更快的推理速度,更适合于耗时较长的复杂智能体应用。 9月1日,美团宣布 LongCat-Flash-Chat 正式发布,在Github、Hugging Face 平台开源,并同步上线官网 https://longcat.ai/ 。 具体来看,LongCat-Flash 模型在架构层面引入"零计算专家(Zero-Computation Experts)"机制,总参数量 560B,每个token 依据上下文需求仅激活 18.6B-31.3B 参数,实现算力按需分配和高效利用。为控制 ...