VitaBench
Search documents
美团视频生成模型来了!一出手就是开源SOTA
量子位· 2025-10-27 05:37
Core Viewpoint - Meituan has launched an open-source video model named LongCat-Video, which supports text-to-video and image-to-video generation, showcasing significant advancements in video generation technology [1][39]. Group 1: Model Features - LongCat-Video has 13.6 billion parameters and can generate videos lasting up to five minutes, demonstrating a strong understanding of real-world physics and semantics [1][12][39]. - The model excels in generating 720p, 30fps videos with high semantic understanding and visual presentation capabilities, ranking among the best in open-source models [18][62]. - It can maintain consistency in generated videos, addressing challenges such as detail capture and complex lighting effects [19][24]. Group 2: Technical Innovations - LongCat-Video integrates three main tasks: text-to-video, image-to-video, and video continuation, using a Diffusion Transformer framework [41]. - The model employs a unique training approach that directly pre-trains on video continuation tasks, mitigating cumulative errors in long video generation [46][48]. - It utilizes advanced techniques like block sparse attention and a from-coarse-to-fine generation paradigm to enhance video generation efficiency [52][53]. Group 3: Performance Evaluation - In internal benchmarks, LongCat-Video outperformed models like PixVerse-V5 and Wan2.2-T2V-A14B in overall quality, with strong performance in visual quality and motion quality [62][63]. - The model achieved a top score in common-sense dimensions, indicating its superior ability to model the physical world [64]. Group 4: Broader Context - This is not the first instance of Meituan venturing into AI; the company has previously released various models, including LongCat-Flash-Chat and LongCat-Flash-Thinking, showcasing its commitment to AI innovation [65][68].
AI点外卖哪家强,美团LongCat团队做了个全面评测
量子位· 2025-10-20 01:16
美团LongCat团队投稿 发自 凹非寺 量子位 | 公众号 QbitAI 美团LongCat团队发布了当前高度贴近真实生活场景、面向复杂问题的大模型智能体评测基准—— VitaBench (Versatile Interactive Tasks Benchmark)。 VitaBench以 外卖点餐、餐厅就餐、旅游出行 三大高频生活场景为典型载体,构建了一个包含 66个工具 的交互式评测环境,并设计了跨场 景综合任务。 例如,在旅行规划任务中,要求智能体通过推理、调用工具与用户交互,完整完成从购票到预订餐厅的全流程。 团队首次从深度推理、工具使用与用户交互三大维度对智能体任务进行量化拆解,从而实现对复杂问题的可控构建。 评测结果显示,即便是当前先进的推理模型,在主榜(复杂跨场景任务)上的成功率也仅约 30% ,揭示了现有智能体与真实生活应用需求之 间的显著差距。 目前,VitaBench已全面开源,旨在为推动智能体在真实生活场景中的研发与落地提供重要基础设施。 研究背景:智能体评测与现实应用间存在巨大鸿沟 随着大语言模型在复杂推理与工具调用能力上的快速进步,基于LLM的智能体在真实生活场景中的应用日益广泛。 ...