智能体训练 - filings, earnings calls, financial reports, news

智能体训练

Search documents

交互扩展时代来临:创智复旦字节重磅发布AgentGym-RL，昇腾加持，开创智能体训练新范式

机器之心· 2025-09-11 04:53

Core Insights - The article emphasizes the transition of artificial intelligence from a "data-intensive" to an "experience-intensive" era, where true intelligence is derived from active exploration and experience accumulation in real environments [10][11][50]. - The introduction of the AgentGym-RL framework represents a significant advancement in training autonomous LLM agents for multi-turn decision-making, addressing the limitations of existing models that rely on single-turn tasks and lack diverse interaction mechanisms [12][50]. Group 1: Framework and Methodology - AgentGym-RL is the first end-to-end framework for LLM agents that does not require supervised fine-tuning, supports interactive multi-turn training, and has been validated in various real-world scenarios [3][15]. - The framework integrates multiple environments and rich trajectory data, simplifying complex environment configurations into modular operations, thus facilitating effective experience-driven learning [13][19]. - The ScalingInter-RL method introduces a progressive interaction round expansion strategy, allowing agents to gradually adapt to environments and optimize their interaction patterns, balancing exploration and exploitation [4][23][25]. Group 2: Performance and Results - The research team achieved remarkable results with a 7B parameter model, which demonstrated complex task handling skills such as understanding task objectives and planning multi-step operations after extensive interaction training [5][29]. - In various testing environments, the model not only surpassed large open-source models over 100B in size but also matched the performance of top commercial models like OpenAI o3 and Google Gemini 2.5 Pro [5][29]. - The ScalingInter-RL model achieved an overall accuracy of 26.00% in web navigation tasks, significantly outperforming GPT-4o's 16.00% and matching the performance of DeepSeek-R1-0528 and Gemini-2.5-Pro [29][30]. Group 3: Future Directions - Future research will focus on upgrading general capabilities to enable agents to make efficient decisions in new environments and with unknown tools [51]. - The team aims to expand into more complex scenarios that closely resemble the physical world, such as robotic operations and real-world planning [52]. - There is an intention to explore multi-agent collaboration training models to unlock more complex group decision-making capabilities [52].