Workflow
金融决策支持系统
icon
Search documents
马斯克转发字节Seed&哥大商学院新基准:大模型搞金融,连查个股价都能出错
量子位· 2025-09-21 02:11
Core Viewpoint - The article discusses the challenges faced by AI in financial analysis, highlighting the launch of FinSearchComp, an open-source benchmark for evaluating AI's financial search and reasoning capabilities [1][5]. Evaluation Results - The best-performing model, Grok 4 (web), achieved an accuracy of 68.9% on the global dataset, still trailing human experts by 6.1 percentage points [2]. - In the Greater China dataset, Doubao (web) led other models but fell short of human experts' accuracy of 88.3% by over 34 percentage points [2]. Importance of Financial AI Assessment - The results indicate significant room for improvement in AI systems when handling complex financial analysis tasks [3]. - The evaluation has sparked widespread discussion in the industry, with notable figures like Elon Musk taking an interest [5][7]. Task Design and Complexity - FinSearchComp features three categories of tasks designed to reflect the daily work of financial analysts, with increasing difficulty [9]. - The tasks include time-sensitive data retrieval, simple historical lookups, and complex historical investigations, emphasizing the need for timeliness, accuracy, and evidence integration [10][11]. Data Reliability and Expert Support - The benchmark's quality is supported by ByteDance's Xpert platform, which provides expert knowledge and experience for high-quality AI training data [13]. - The project involved 70 financial experts, ensuring data reliability through cross-validation from official sources and professional financial databases [14]. Key Findings on AI Performance - The evaluation confirmed that search capability is crucial, with models equipped with web search functions showing significant performance improvements [16]. - Financial plugins demonstrated their value, with models using them achieving a 31.9 percentage point increase in performance [18]. Implications for Financial Analysts - There are approximately 370,000 financial professionals in the U.S. and over 1 million globally, with many still relying on manual data collection for information retrieval tasks [19]. - The article suggests that if AI can accurately perform these tasks, it could significantly enhance productivity in the financial analysis field [19]. Future Considerations - The article advocates for the establishment of a comprehensive evaluation system for financial AI, akin to a "driving test," to ensure reliability before AI can fully support financial decision-making [19].