先解行为,再训Agent:CMU开源首份Agentic Search日志数据,把Agent拆开给你看
机器之心·2026-02-09 01:18

Core Insights - The article discusses the lack of systematic characterization and analysis of how intelligent agents perform queries, rewrite them, and utilize retrieved information in the context of Agentic Search driven by large language models [2][7]. Group 1: Research Contributions - The CMU team organized over 14 million Agentic Search requests and approximately 4 million sessions from six months of real traffic, releasing the first open-source Agentic Search behavior log dataset [7][8]. - A three-layer analytical framework was proposed, consisting of session intent (Declarative / Procedural / Reasoning), trajectory actions (Specialization / Generalization / Exploration / Repetition), and the Context-driven Term Adoption Rate (CTAR) to measure the adoption of retrieved information [2][8]. Group 2: Data and Platform Overview - The DeepResearchGym (DRGym) platform was established for research purposes, providing a unified search API based on dense retrieval from fixed web corpus snapshots [12]. - The dataset includes logs from 25 countries and nearly 600 IP addresses, ensuring diverse usage and anonymity through data cleaning and anonymization processes [13][14]. Group 3: Session Analysis Methodology - A semantic and temporal joint sessionization strategy was employed to analyze behavior patterns, resulting in approximately 4 million sessions characterized by high-frequency and iterative queries [16][19]. - The analysis revealed that the majority of queries were concentrated in a dispersed semantic space, with low overlap with common Agentic Benchmark tasks [18]. Group 4: Intent and Trajectory Dynamics - The research categorized multi-turn sessions into three types of session intents: Declarative, Procedural, and Reasoning, with distinct characteristics in session length and retrieval configurations [22][25]. - Four types of trajectory moves were identified: Specialization, Generalization, Exploration, and Repetition, with a notable "drill-down bias" observed in the agents' behavior [27][32]. Group 5: CTAR Insights - The CTAR metric indicated that over half of new terms in queries could be traced back to previously retrieved documents, highlighting the agents' reliance on historical context [34][35]. - Different trajectory moves exhibited significant variations in CTAR, with Specialization and Exploration showing higher rates of term adoption compared to Repetition [36][37]. Group 6: System Design Implications - The findings suggest that repeated actions could signal potential stagnation in the agent's search process, prompting the need for system interventions to trigger exploration or generalization strategies [41]. - The retrieval budget should adapt based on task intent and trajectory state, allowing for more effective document coverage and query refinement [42]. - Incorporating CTAR and similar metrics into system monitoring can help assess whether agents are effectively utilizing retrieved information [43]. Group 7: Overall Contributions - The research provides the first open-source dataset for Agentic Search behavior logs, establishing a reproducible data foundation for future studies [46]. - It introduces an analytical framework for understanding Agentic Search processes, offering tools for behavior modeling and strategy comparison [47]. - The study also translates empirical observations into quantifiable design recommendations for improving agentic search systems [48].

先解行为,再训Agent:CMU开源首份Agentic Search日志数据,把Agent拆开给你看 - Reportify