Workflow
原神Agent(Lumine)
icon
Search documents
原神Agent,字节出品
猿大侠· 2025-11-16 04:11
Core Viewpoint - ByteDance has developed a new gaming agent named Lumine, capable of autonomously playing games like Genshin Impact, showcasing advanced skills in exploration, combat, and puzzle-solving [1][4][16]. Group 1: Agent Capabilities - Lumine can perform complex tasks such as dynamic enemy tracking, precise long-range shooting, and smooth character switching, effectively handling various game scenarios [4][6][10]. - The agent demonstrates strong understanding in boss battles and can solve intricate puzzles, indicating high spatial awareness [6][8][10]. - Lumine is capable of executing GUI operations and can follow complex instructions with clear prior information, enhancing its usability in gaming [12][14]. Group 2: Technical Framework - Lumine is built on the Qwen2-VL-7B-Base model, leveraging multimodal understanding and generation capabilities acquired from extensive training on web data [16]. - The agent employs a unified language space for modeling operations and reasoning, facilitating seamless integration of perception, reasoning, and action [16][19]. - Three core mechanisms are designed for Lumine: Observation Space for visual input processing, Hybrid Thinking for decision-making efficiency, and Keyboard and Mouse Modelling for operational commands [19][22][23]. Group 3: Training Process - The training process consists of three phases: pre-training for basic actions, instruction-following training for task comprehension, and decision reasoning training for long-term task execution [25][27][29]. - Lumine-Base model emerges with core capabilities like object interaction and basic combat, while Lumine-Instruct model achieves over 80% success in short tasks [26][28]. - The Lumine-Thinking model can autonomously complete long-term tasks without human intervention, showcasing its advanced planning and reasoning abilities [30]. Group 4: Performance Evaluation - In comparative tests, Lumine-Base shows over 90% success in basic interactions but lacks goal-oriented behavior in untrained areas [39]. - Lumine-Instruct outperforms mainstream VLMs in task completion rates, achieving 92.5% in simple tasks and 76.8% in difficult tasks, demonstrating superior tactical planning [41]. - Lumine-Thinking completes main story tasks in Genshin Impact with a 100% completion rate in 56 minutes, significantly outperforming competitors like GPT-5 [44][45]. Group 5: Industry Implications - The development of gaming agents like Lumine represents a significant step towards creating general-purpose AI capable of operating in complex 3D environments [50][55]. - Companies like Google are also exploring similar paths with their SIMA 2 agent, indicating a broader industry trend towards utilizing gaming scenarios for training AI [52][56]. - The belief in the eventual transition of gaming agents into real-world applications highlights the potential for embodied intelligence in various sectors [56].
原神Agent,字节出品
量子位· 2025-11-14 12:10
Core Viewpoint - ByteDance has developed a new gaming agent named Lumine, capable of autonomously playing games like Genshin Impact, showcasing advanced skills in exploration, combat, and puzzle-solving [1][4][9]. Group 1: Agent Capabilities - Lumine can perform complex tasks in Genshin Impact, including dynamic enemy tracking, precise long-range shooting, and smooth character switching [4][5]. - The agent demonstrates strong understanding in boss battles and can solve various puzzles, such as collecting items based on environmental cues [6][12]. - Lumine is capable of executing GUI operations and can follow complex instructions by understanding prior task information [7][8]. Group 2: Technical Framework - Lumine is built on the Qwen2-VL-7B-Base model, leveraging multimodal understanding and generation capabilities from extensive web data training [9][10]. - The agent employs three core mechanisms: Observation Space for visual input processing, Hybrid Thinking for decision-making efficiency, and Keyboard and Mouse Modelling for action representation [12][14][15]. - A three-phase training process was implemented, including pre-training for basic actions, instruction-following training, and decision reasoning training, leading to high task completion rates [17][20][23]. Group 3: Performance Metrics - Lumine-Base shows a stepwise emergence of capabilities, achieving over 90% success in basic interactions but lacking goal-directed behavior [38]. - Lumine-Instruct outperforms mainstream VLMs in short-cycle tasks, achieving a success rate of 92.5% in simple tasks and 76.8% in difficult tasks [33][35]. - Lumine-Thinking demonstrates exceptional performance in long-term tasks, completing the main storyline of Genshin Impact in 56 minutes with a 100% task completion rate, significantly faster than competitors [41][42]. Group 4: Cross-Game Adaptability - Lumine-Thinking exhibits strong adaptability across different games, successfully completing tasks in titles like Honkai: Star Rail and Black Myth: Wukong, showcasing its general agent characteristics [45][46]. - The agent's ability to navigate unfamiliar environments and execute complex tasks highlights its potential for broader applications beyond gaming [45][46]. Group 5: Industry Implications - The development of Lumine reflects a trend in the industry where companies like Google are also creating agents capable of operating in 3D game environments, indicating a clear path towards embodied AGI [48][51]. - The belief in the eventual transition of gaming agents into real-world applications underscores the significance of advancements in AI and gaming technology [51].