Workflow
人大、清华DeepAnalyze,让LLM化身数据科学家
机器之心·2025-10-30 08:52

Core Viewpoint - DeepAnalyze is the first agentic LLM designed for autonomous data science, capable of performing complex data science tasks through autonomous orchestration and adaptive optimization [25]. Group 1: Overview of DeepAnalyze - DeepAnalyze has gained significant attention, receiving over 1,000 GitHub stars and 200,000 social media views within a week of its release [2]. - The model is open-source, inviting researchers and practitioners to contribute and collaborate [5]. Group 2: Capabilities of DeepAnalyze - DeepAnalyze-8B can simulate the behavior of data scientists, autonomously orchestrating and optimizing operations to complete complex data science tasks [2][10]. - It supports various data-centric tasks, including automated data preparation, analysis, modeling, visualization, insights generation, and report creation [4]. Group 3: Training and Methodology - Existing methods for applying LLMs to autonomous data science face limitations, which DeepAnalyze aims to overcome by transitioning from workflow-based agents to trainable agentic LLMs [6]. - The model introduces Curriculum-based Agentic Training and Data-grounded Trajectory Synthesis to address challenges such as reward sparsity and trajectory scarcity in complex scenarios [14][25]. Group 4: Performance Metrics - DeepAnalyze-8B outperforms all open-source models on the DataSciBench, achieving a success rate of 59.91% in completion rates, comparable to GPT-4o [12]. - In specific tasks like data analysis and modeling, DeepAnalyze demonstrates superior performance due to its agentic model approach [12][18]. Group 5: Research and Development - The research team behind DeepAnalyze includes experts from Renmin University and Tsinghua University, focusing on integrating AI with data science [27][29].