FinSearchComp

Search documents
马斯克转发字节Seed&哥大商学院新基准:大模型搞金融,连查个股价都能出错
Sou Hu Cai Jing· 2025-09-21 02:34
Core Insights - The article discusses the launch of FinSearchComp, an open-source financial search and reasoning benchmark developed by ByteDance's Seed team in collaboration with Columbia Business School, aimed at evaluating AI's performance in financial analysis tasks [1][3][5] Evaluation Results - The best-performing model, Grok 4 (web), achieved an accuracy of 68.9% on the global dataset, which is still 6.1 percentage points behind human experts. In the Greater China dataset, Doubao (web) led with an accuracy of 53.3%, falling short by over 34 percentage points compared to human experts' 88.3% [1][11] Task Design - FinSearchComp includes three progressively challenging task categories that reflect the complexity of financial analysts' daily work: 1. Time-sensitive data fetching, focusing on real-time data like stock prices [7] 2. Simple historical lookup, requiring fixed-point fact retrieval [7] 3. Complex historical investigation, demanding multi-period aggregation and analysis [7] Data Reliability - The benchmark's quality is supported by ByteDance's Xpert platform, which provides expert knowledge and high-quality AI training data. The project involved 70 financial experts, ensuring data reliability through cross-validation from official sources and professional financial databases [9][10] Importance of Search Capability - The evaluation highlighted the critical role of search capabilities, with models equipped with web search functionality showing significant performance improvements across tasks. Models without search capabilities scored zero on time-sensitive tasks, emphasizing the necessity of real-time data access for accurate financial analysis [12][11] Industry Implications - The findings suggest that while AI can assist in financial data retrieval, it still has considerable room for improvement. The article advocates for the establishment of a comprehensive evaluation system for financial AI, akin to a "driving license" for AI products, to ensure reliability before they can fully replace human analysts [13]
马斯克转发字节Seed&哥大商学院新基准:大模型搞金融,连查个股价都能出错
量子位· 2025-09-21 02:11
Core Viewpoint - The article discusses the challenges faced by AI in financial analysis, highlighting the launch of FinSearchComp, an open-source benchmark for evaluating AI's financial search and reasoning capabilities [1][5]. Evaluation Results - The best-performing model, Grok 4 (web), achieved an accuracy of 68.9% on the global dataset, still trailing human experts by 6.1 percentage points [2]. - In the Greater China dataset, Doubao (web) led other models but fell short of human experts' accuracy of 88.3% by over 34 percentage points [2]. Importance of Financial AI Assessment - The results indicate significant room for improvement in AI systems when handling complex financial analysis tasks [3]. - The evaluation has sparked widespread discussion in the industry, with notable figures like Elon Musk taking an interest [5][7]. Task Design and Complexity - FinSearchComp features three categories of tasks designed to reflect the daily work of financial analysts, with increasing difficulty [9]. - The tasks include time-sensitive data retrieval, simple historical lookups, and complex historical investigations, emphasizing the need for timeliness, accuracy, and evidence integration [10][11]. Data Reliability and Expert Support - The benchmark's quality is supported by ByteDance's Xpert platform, which provides expert knowledge and experience for high-quality AI training data [13]. - The project involved 70 financial experts, ensuring data reliability through cross-validation from official sources and professional financial databases [14]. Key Findings on AI Performance - The evaluation confirmed that search capability is crucial, with models equipped with web search functions showing significant performance improvements [16]. - Financial plugins demonstrated their value, with models using them achieving a 31.9 percentage point increase in performance [18]. Implications for Financial Analysts - There are approximately 370,000 financial professionals in the U.S. and over 1 million globally, with many still relying on manual data collection for information retrieval tasks [19]. - The article suggests that if AI can accurately perform these tasks, it could significantly enhance productivity in the financial analysis field [19]. Future Considerations - The article advocates for the establishment of a comprehensive evaluation system for financial AI, akin to a "driving test," to ensure reliability before AI can fully support financial decision-making [19].