Workflow
DeepAnalyze
icon
Search documents
LLM能替代数据科学家了?DeepAnalyze帮你告别手动分析数据
量子位· 2025-11-01 03:59
Core Insights - DeepAnalyze is introduced as a specialized "data scientist" that automates data analysis and various data science tasks with a single command [1][5] - The tool supports automated data preparation, analysis, modeling, visualization, and insights generation [3] - DeepAnalyze is the first Agentic LLM designed for data science, capable of independently completing complex data tasks without predefined workflows [5][6] Data Science Tasks - DeepAnalyze can perform automated data preparation, analysis, modeling, visualization, and insights generation [3] - It is capable of conducting open-ended deep research across unstructured, semi-structured, and structured data, generating comprehensive research reports [3][16] Training Methodology - DeepAnalyze employs a curriculum-based Agentic training paradigm to enable LLMs to autonomously complete complex data science tasks [10][12] - The training process consists of two phases: single capability fine-tuning and multi-capability Agentic training in real task environments [13] Curriculum-Based Agentic Training - This training method simulates the learning path of human data scientists, allowing LLMs to progress from simple to complex tasks [12] - It addresses the "sparse reward" problem in reinforcement learning, ensuring that models receive positive feedback during training [11][12] Data-Grounded Trajectory Synthesis - DeepAnalyze introduces a method for synthesizing 500,000 data science reasoning and interaction trajectories to guide LLMs in solving long-chain problems [14] - This synthesis includes reasoning trajectory synthesis and interaction trajectory synthesis, providing effective guidance for LLMs in exploring solution spaces [15] Research Capabilities - DeepAnalyze can automatically generate research reports that meet analyst standards, outperforming existing closed-source LLMs in both content depth and report structure [16]
人大、清华DeepAnalyze,让LLM化身数据科学家
机器之心· 2025-10-30 08:52
Core Viewpoint - DeepAnalyze is the first agentic LLM designed for autonomous data science, capable of performing complex data science tasks through autonomous orchestration and adaptive optimization [25]. Group 1: Overview of DeepAnalyze - DeepAnalyze has gained significant attention, receiving over 1,000 GitHub stars and 200,000 social media views within a week of its release [2]. - The model is open-source, inviting researchers and practitioners to contribute and collaborate [5]. Group 2: Capabilities of DeepAnalyze - DeepAnalyze-8B can simulate the behavior of data scientists, autonomously orchestrating and optimizing operations to complete complex data science tasks [2][10]. - It supports various data-centric tasks, including automated data preparation, analysis, modeling, visualization, insights generation, and report creation [4]. Group 3: Training and Methodology - Existing methods for applying LLMs to autonomous data science face limitations, which DeepAnalyze aims to overcome by transitioning from workflow-based agents to trainable agentic LLMs [6]. - The model introduces Curriculum-based Agentic Training and Data-grounded Trajectory Synthesis to address challenges such as reward sparsity and trajectory scarcity in complex scenarios [14][25]. Group 4: Performance Metrics - DeepAnalyze-8B outperforms all open-source models on the DataSciBench, achieving a success rate of 59.91% in completion rates, comparable to GPT-4o [12]. - In specific tasks like data analysis and modeling, DeepAnalyze demonstrates superior performance due to its agentic model approach [12][18]. Group 5: Research and Development - The research team behind DeepAnalyze includes experts from Renmin University and Tsinghua University, focusing on integrating AI with data science [27][29].