美团新独立APP,点不了菜只能点AI
量子位·2025-11-03 03:12

Core Viewpoint - Meituan is leveraging its expertise in delivery services to develop advanced AI models, with the latest being LongCat-Flash-Omni, which supports multimodal capabilities and achieves state-of-the-art performance in open-source benchmarks [2][8]. Group 1: Model Performance and Features - LongCat-Flash-Omni has surpassed other models like Qwen3-Omni and Gemini-2.5-Flash in comprehensive multimodal benchmarks, achieving open-source state-of-the-art status [2]. - The model maintains high performance across individual modalities such as text, image, audio, and video, demonstrating robust capabilities without sacrificing intelligence [3]. - With a total of 560 billion parameters and only 27 billion active parameters, the model utilizes a "large total parameters, small active" MoE architecture, ensuring high inference efficiency while retaining extensive knowledge [4]. Group 2: User Experience and Accessibility - LongCat-Flash-Omni is the first open-source model capable of real-time multimodal interaction, enhancing user experience significantly [8]. - The model is available for free on Meituan's LongCat APP and web platform, supporting various input methods including text, voice, and image uploads [9][10]. - Users have reported a smooth interaction experience, with quick response times and effective handling of complex multimodal tasks [25][26]. Group 3: Development Strategy - Meituan's iterative model development strategy focuses on speed, specialization, and comprehensive capabilities, aiming to create an AI that can understand and interact with complex real-world scenarios [29][31]. - The company has a clear path for expanding its AI capabilities, moving from basic chatbots to advanced multimodal models, thereby laying the groundwork for a "world model" that deeply understands reality [47][62]. - Meituan's investments in embodied intelligence and robotics are part of a broader strategy to connect the digital and physical worlds, enhancing service efficiency and user experience [42][56]. Group 4: Challenges and Innovations - The development of multimodal models presents challenges such as high integration difficulty, real-time interaction performance, and training efficiency [33][36]. - LongCat-Flash-Omni addresses these challenges through innovative architectural designs, including a unified end-to-end architecture and progressive training methods that enhance multimodal capabilities [38][39]. - The model's design allows for low-latency real-time interactions, setting it apart from existing models that struggle with responsiveness [36][39].