Workflow
交互式强化学习
icon
Search documents
任务级奖励提升App Agent思考力,淘天提出Mobile-R1,3B模型可超32B
量子位· 2025-07-20 02:49
Core Insights - The article discusses the limitations of existing Mobile/APP Agents that primarily rely on action-level rewards, which restrict their adaptability in dynamic environments [1][2] - A new interactive reinforcement learning framework called Mobile-R1 is proposed, which incorporates task-level rewards to enhance agent adaptability and exploration capabilities [5][30] - The training process for Mobile-R1 consists of three stages: format fine-tuning, action-level training, and task-level training, which collectively improve the model's performance [6][31] Summary by Sections Existing Limitations - Current Mobile/APP Agents struggle with real-time adaptability due to their reliance on action-level rewards, making it difficult to handle changing mobile environments [1][2] - An example illustrates the failure of existing models in executing complex multi-step tasks [3] Proposed Solution - The collaboration between TaoTian Group's algorithm team and Future Life Lab introduces a multi-round, task-oriented learning approach that combines online learning and trajectory correction [4] - Mobile-R1 is designed to utilize task-level rewards, which are more effective in guiding agents through complex tasks [5] Training Methodology - The training process is divided into three stages: 1. **Format Fine-tuning**: Initial adjustments using supervised fine-tuning with high-quality trajectory data [16] 2. **Action-level Training**: Utilizes group relative policy optimization (GRPO) to evaluate action correctness with action-level rewards [17] 3. **Task-level Training**: Enhances model generalization and exploration through multi-step task-level training [18][20] Experimental Results - Mobile-R1 demonstrated superior performance across various benchmarks, achieving a task success rate of 49.40%, significantly higher than the best baseline model [26] - The results indicate that the three-stage training process effectively improves the model's robustness and adaptability, particularly in dynamic environments [29][30] - The article concludes that Mobile-R1's integration of interactive reinforcement learning and task-level rewards significantly enhances the capabilities of visual language model-based mobile agents [30][32]