Core Viewpoint - ByteDance has developed a new gaming agent named Lumine, capable of autonomously playing games like Genshin Impact, showcasing advanced skills in exploration, combat, and puzzle-solving [1][4][16]. Group 1: Agent Capabilities - Lumine can perform complex tasks such as dynamic enemy tracking, precise long-range shooting, and smooth character switching, effectively handling various game scenarios [4][6][10]. - The agent demonstrates strong understanding in boss battles and can solve intricate puzzles, indicating high spatial awareness [6][8][10]. - Lumine is capable of executing GUI operations and can follow complex instructions with clear prior information, enhancing its usability in gaming [12][14]. Group 2: Technical Framework - Lumine is built on the Qwen2-VL-7B-Base model, leveraging multimodal understanding and generation capabilities acquired from extensive training on web data [16]. - The agent employs a unified language space for modeling operations and reasoning, facilitating seamless integration of perception, reasoning, and action [16][19]. - Three core mechanisms are designed for Lumine: Observation Space for visual input processing, Hybrid Thinking for decision-making efficiency, and Keyboard and Mouse Modelling for operational commands [19][22][23]. Group 3: Training Process - The training process consists of three phases: pre-training for basic actions, instruction-following training for task comprehension, and decision reasoning training for long-term task execution [25][27][29]. - Lumine-Base model emerges with core capabilities like object interaction and basic combat, while Lumine-Instruct model achieves over 80% success in short tasks [26][28]. - The Lumine-Thinking model can autonomously complete long-term tasks without human intervention, showcasing its advanced planning and reasoning abilities [30]. Group 4: Performance Evaluation - In comparative tests, Lumine-Base shows over 90% success in basic interactions but lacks goal-oriented behavior in untrained areas [39]. - Lumine-Instruct outperforms mainstream VLMs in task completion rates, achieving 92.5% in simple tasks and 76.8% in difficult tasks, demonstrating superior tactical planning [41]. - Lumine-Thinking completes main story tasks in Genshin Impact with a 100% completion rate in 56 minutes, significantly outperforming competitors like GPT-5 [44][45]. Group 5: Industry Implications - The development of gaming agents like Lumine represents a significant step towards creating general-purpose AI capable of operating in complex 3D environments [50][55]. - Companies like Google are also exploring similar paths with their SIMA 2 agent, indicating a broader industry trend towards utilizing gaming scenarios for training AI [52][56]. - The belief in the eventual transition of gaming agents into real-world applications highlights the potential for embodied intelligence in various sectors [56].
原神Agent,字节出品
猿大侠·2025-11-16 04:11