金融大模型评测体系2.0版在上海发布

Core Insights - The 2025 Financial Large Model Evaluation System was released in Shanghai, marking the first comprehensive upgrade to version 2.0 since the initial launch last year, focusing on standard leadership, data-driven approaches, security and trustworthiness, and ecosystem co-construction [1] - The evaluation system serves as a scientific benchmark for the industry, providing a reliable method for assessing the performance, safety, and reliability of various financial large models [1] Group 1 - The evaluation system integrates 4 public datasets and 22 self-built datasets, totaling approximately 36,000 evaluation data points, employing a randomized selection mechanism and diverse prompts [1] - The system automates and standardizes the entire evaluation process, offering authoritative and precise assessments of large model capabilities for banks, brokerages, funds, and investment institutions in Shanghai [1] Group 2 - The latest evaluation results show a significant improvement, with the industry average score rising from 71.9 to 87.37 this year [2] - Domestic financial large models excel in language understanding, terminology disambiguation, regulatory policy updates, and compliance alignment, while foreign models lead in mathematical calculations, step reasoning, cross-language reasoning, and processing of long texts [2]