只狼:影逝二度

Search documents
首个3D动作游戏专用VLA模型,打黑神话&只狼超越人类玩家 | ICCV 2025
量子位· 2025-08-19 05:25
Core Insights - CombatVLA, a 3B multimodal model, surpasses GPT-4o and human players in combat tasks within action role-playing games, demonstrating significant advancements in real-time decision-making and tactical reasoning [1][4][52]. Group 1: CombatVLA Overview - CombatVLA integrates visual, semantic, and action control to enhance embodied intelligence, addressing challenges in 3D combat scenarios such as visual perception, combat reasoning, and efficient inference [6][8]. - The model achieves a 50-fold acceleration in combat execution speed compared to existing models, with a higher success rate than human players [4][11][52]. Group 2: Action Tracking and Benchmarking - An action tracker was developed to collect human action sequences in games, providing extensive training data for the combat understanding model [15][17]. - The CUBench benchmark was established to evaluate the model's combat intelligence based on three core capabilities: information acquisition, understanding, and reasoning [20][21]. Group 3: CombatVLA Model and Training - The Action-of-Thought (AoT) dataset was created to facilitate the model's understanding of combat actions, structured in a way that enhances reasoning speed [24][25]. - CombatVLA employs a three-stage progressive training paradigm, gradually refining the model's combat strategies from video-level to frame-level optimization [27][33]. Group 4: Experimental Results - In combat understanding evaluations, CombatVLA achieved a top average score of 63.61 on CUBench, outperforming other models significantly [46]. - The model demonstrated robust generalization capabilities, performing comparably to baseline models in general benchmarks while excelling in task-level evaluations [47][48].