Core Viewpoint - ByteDance has developed a new gaming agent named Lumine, capable of autonomously playing games like Genshin Impact, showcasing advanced skills in exploration, combat, and puzzle-solving [1][4][9]. Group 1: Agent Capabilities - Lumine can perform complex tasks in Genshin Impact, including dynamic enemy tracking, precise long-range shooting, and smooth character switching [4][5]. - The agent demonstrates strong understanding in boss battles and can solve various puzzles, such as collecting items based on environmental cues [6][12]. - Lumine is capable of executing GUI operations and can follow complex instructions by understanding prior task information [7][8]. Group 2: Technical Framework - Lumine is built on the Qwen2-VL-7B-Base model, leveraging multimodal understanding and generation capabilities from extensive web data training [9][10]. - The agent employs three core mechanisms: Observation Space for visual input processing, Hybrid Thinking for decision-making efficiency, and Keyboard and Mouse Modelling for action representation [12][14][15]. - A three-phase training process was implemented, including pre-training for basic actions, instruction-following training, and decision reasoning training, leading to high task completion rates [17][20][23]. Group 3: Performance Metrics - Lumine-Base shows a stepwise emergence of capabilities, achieving over 90% success in basic interactions but lacking goal-directed behavior [38]. - Lumine-Instruct outperforms mainstream VLMs in short-cycle tasks, achieving a success rate of 92.5% in simple tasks and 76.8% in difficult tasks [33][35]. - Lumine-Thinking demonstrates exceptional performance in long-term tasks, completing the main storyline of Genshin Impact in 56 minutes with a 100% task completion rate, significantly faster than competitors [41][42]. Group 4: Cross-Game Adaptability - Lumine-Thinking exhibits strong adaptability across different games, successfully completing tasks in titles like Honkai: Star Rail and Black Myth: Wukong, showcasing its general agent characteristics [45][46]. - The agent's ability to navigate unfamiliar environments and execute complex tasks highlights its potential for broader applications beyond gaming [45][46]. Group 5: Industry Implications - The development of Lumine reflects a trend in the industry where companies like Google are also creating agents capable of operating in 3D game environments, indicating a clear path towards embodied AGI [48][51]. - The belief in the eventual transition of gaming agents into real-world applications underscores the significance of advancements in AI and gaming technology [51].
原神Agent,字节出品
量子位·2025-11-14 12:10