WebDancer

Search documents
阿里发布信息检索Agent,可自主上网查资料,GAIA基准超越GPT-4o | 模型&数据开源
量子位· 2025-06-27 04:40
Core Viewpoint - Alibaba has introduced WebDancer, an autonomous information retrieval agent capable of understanding and navigating the web like a human, enhancing the capabilities of traditional models through multi-step reasoning and tool usage [1][3]. Group 1: WebDancer's Capabilities - WebDancer can perform complex tasks such as web browsing, information searching, and question answering, demonstrating its ability to execute multi-step reasoning [9]. - The model achieved a Pass@3 score of 61.1% on GAIA and 54.6% on WebWalkerQA, outperforming baseline models and some open-source frameworks [4][34]. - WebDancer employs a four-stage training paradigm, which includes data construction, trajectory sampling, supervised fine-tuning, and reinforcement learning to enhance its reasoning and decision-making capabilities [10][28]. Group 2: Training Methodology - The first stage involves constructing browsing data to create complex QA pairs that require multiple interactions, simulating human behavior [12][15]. - The second stage focuses on generating high-quality Thought-Action-Observation trajectories, utilizing a dual-path sampling method for both short and long reasoning chains [20][22]. - The supervised fine-tuning stage integrates these trajectories to teach the model basic task decomposition and tool usage while preserving its original reasoning abilities [25][27]. - The reinforcement learning stage aims to optimize the agent's decision-making and generalization capabilities in real-world web environments [28][30]. Group 3: Performance Analysis - WebDancer's performance was tested on challenging datasets, including BrowseComp in English and Chinese, where it demonstrated robust capabilities in handling difficult reasoning and information retrieval tasks [36]. - The analysis of Pass@1 and Pass@3 metrics indicates that reinforcement learning significantly improves the sampling of correct responses, while consistency in language reasoning models shows notable improvement [38].
通义实验室最新成果WebDancer:开启自主智能Deep Research的新时代
机器之心· 2025-06-12 06:08
作者介绍: 本文作者来自通义实验室 RAG 团队,致力于面向下一代 RAG 技术进行基础研究。该团队 WebWalker 工作近期也被 ACL 2025 main conference 录 用。 它得能看懂网页,能做多步决策; 它得能适应开放动态环境; 它得能自主提问、自主行动、自主修正…… 一、背景:信息检索的新需求与挑战 在当今信息爆炸的时代,解决复杂问题不再仅仅是简单的知识检索,而是需要深入的信息挖掘和多步推理。从医学研究到科技创新,从商业决策到学术探索,每 一个领域都呼唤着能够自主思考、自主决策的智能体。Deep Research 等系统已经为我们展示了自主多步研究的巨大潜力,但构建这样的智能体并非易事。它们需 要在复杂的网络环境中感知、决策、行动,还要面对任务复杂度高、泛化能力弱等诸多挑战。 但打造这样一个 Deep Research 类智能体智能体,并不简单! 在这种背景下,WebDancer 的出现,走出了一条复现 Deep Research 类智能体的可行路径。 自主信息检索智能体的构建,或者如何复现 Deep Research 类的模型一直面临着两大棘手难题:高质量训练数据的稀缺与开放环境训 ...
阿里智能体多轮推理超越GPT-4o,开源模型也能做Deep Research
量子位· 2025-06-06 04:01
WebDancer团队 投稿 量子位 | 公众号 QbitAI 能够完成多步信息检索任务,涵盖多轮推理与连续动作执行的智能体来了。 通义实验室推出WebWalker(ACL2025)续作自主信息检索智能体WebDancer。 WebDancer 通过系统化的训练范式——涵盖从数据构建到算法设计的全流程——为构建具备长期信息检索能力的智能体提供了明确路径。 同时,该框架也为在开源模型上复现Deep Research系统提供了可行的指导。团队将进一步在更开放的环境中、结合更多工具,持续拓展和 集成Agentic能力,推动通用智能体的落地与演进。 一、背景:信息检索的新需求与挑战 在信息爆炸的时代,传统的搜索引擎已难以满足用户对深层次、多步骤信息获取的需求。从医学研究到科技创新,从商业决策到学术探索,复 杂问题的解决需要深入的信息挖掘和多步推理能力。这催生了对能够自主思考、自主决策的智能体的需求。 然而,构建这样的智能体面临诸多挑战: 二、突破训练数据难获得问题 在自主信息检索领域,高质量的训练数据至关重要。然而,现有的数据集如2WIKI,HotpotQA多为浅层次问题,难以支持复杂多步推理的训 练需求。 数据过滤 ...