Workflow
LLM智能体
icon
Search documents
开源RL框架Verlog来了,专为LLM智能体打造,400回合不成问题
机器之心· 2025-10-08 04:13
Core Insights - The article discusses the challenges faced by intelligent agents in maintaining clear reasoning and robust decision-making over long-term tasks, particularly when the task extends to hundreds of steps [2][3] - It introduces Verlog, a multi-turn reinforcement learning framework designed to handle long-horizon tasks effectively, overcoming limitations of traditional frameworks [3][20] Group 1: Framework Overview - Verlog is built on the foundations of VeRL and BALROG, incorporating specialized optimization techniques to ensure stable and efficient training across tasks that can extend beyond 400 steps [3][20] - The framework has been validated in complex environments such as BabyAI, BabaIsAI, and Crafter, demonstrating strong performance in tasks with varying episode lengths [3][19] Group 2: Methodology - The base model for Verlog is the Qwen-2.5 Instruct variant, which allows seamless integration with BALROG and facilitates the use of benchmark testing prompts with minimal modifications [6][7] - A memory mechanism is employed to retain only the latest n + 1 rounds of interactions, optimizing performance for the 3B parameter Qwen model [9][10] Group 3: Algorithmic Innovations - The Dual Discounting GAE algorithm is introduced to decouple tokens from steps, encouraging agents to complete tasks in fewer environment steps [11][20] - The recursive calculation of GAE enhances the stability of training, allowing for effective learning even in sparse reward scenarios [12][14] Group 4: Experimental Results - Verlog was tested on three challenging benchmarks: Crafter, BabyAI, and BabaIsAI, showcasing its ability to adapt to long-duration tasks with sparse rewards [16][19] - The training of the Qwen2.5-7B-Instruct model in the Crafter environment utilized 8 H100 GPUs over approximately 36 hours, while the Qwen2.5-3B-Instruct model for BabyAI and BabaIsAI was trained on 4 A40 GPUs for about 24 hours [19] Group 5: Future Directions - Verlog aims to serve as a flexible research platform to advance the development of long-horizon LLM-Agent reinforcement learning [21][20] - The framework addresses key engineering challenges such as managing long interaction histories, ensuring training stability under sparse rewards, and handling variable trajectory lengths [20][23]
如何为LLM智能体编写工具?Anthropic官方教程来了
机器之心· 2025-09-12 11:31
Core Insights - The article emphasizes the need to rethink tool development for agentic AI systems, moving away from traditional deterministic logic to accommodate the non-deterministic nature of AI agents [1][3][10] - It highlights that the effectiveness of AI agents is heavily dependent on the tools provided to them, and outlines a path for optimizing these tools [1][3][4] Tool Definition and Development - Tools for AI agents are defined as new software forms that bridge deterministic systems and non-deterministic agents, requiring a different approach to design [8][9][10] - The article suggests a rapid prototyping approach for tool development, followed by comprehensive evaluations to assess performance and make iterative improvements [12][14] Evaluation Process - Evaluation tasks should be generated based on real-world scenarios and data sources, ensuring that prompts are paired with verifiable responses [23][25] - The article advises against overly simplistic testing environments, advocating for complex conditions that can effectively stress-test the tools [27] Tool Design Principles - It is recommended to build a limited number of well-thought-out tools that align with high-value workflows, rather than creating numerous redundant tools [43][47] - Tools should be designed with clear and independent objectives to prevent confusion among AI agents when selecting the appropriate tool [45][50] Naming and Response Optimization - Implementing namespaces for tools can help clarify their functions and reduce confusion for AI agents [48][51] - Tools should return high-signal information, prioritizing context relevance over flexibility, to enhance the agent's performance [52][56] Future Outlook - The article concludes that the development of efficient tools for AI agents requires a shift from predictable deterministic patterns to non-deterministic approaches, with a focus on iterative, evaluation-driven processes [66]