2025年中国大模型年度评测

Investment Rating - The report does not explicitly provide an investment rating for the industry Core Insights - The evaluation results indicate that Chinese large models are rapidly closing the gap with international counterparts, with the top eight Chinese models scoring nearly on par with leading overseas models [2] - Large models have become "knowledge encyclopedia experts," achieving near-perfect scores in knowledge-related questions across various tests [2] - The significant disparity in logical reasoning and mathematical capabilities among models highlights these areas as critical differentiators of model strength [3] - Chinese large models exhibit a cost advantage over international models, with an average price of 38.2 yuan per 1 million tokens compared to 158.3 yuan for international models, showcasing a nearly fivefold cost efficiency [3] Summary by Sections Industry Overview - The development of large models has progressed through three stages: initial focus on modality understanding, expansion to modality generation, and advanced stage achieving arbitrary modality conversion and intelligent integration [13][14] - AI technology has significantly improved work efficiency, with 96.3% of respondents acknowledging efficiency gains, particularly in repetitive tasks and high cognitive load activities [20][24] - Despite advancements in text and image generation, AI still lags behind human capabilities in language style, creativity, coherence, error rates, and handling complex scenarios [27][32] Evaluation Background - The competitive landscape of large models in China has stabilized, with approximately 20 main competitors, primarily from internet companies, cloud computing giants, and AI startups [37][38] - The report aims to comprehensively assess the technical strength and application progress of large models in both language and multimodal capabilities [39] Language Model Evaluation - The comprehensive results of the language model evaluation show that international models generally outperform Chinese models, with several Chinese models surpassing the international average [47][48] Multimodal Evaluation - In the multimodal evaluation, Alibaba Cloud, SenseTime, and Tencent's models excelled, ranking as the top three due to their outstanding multimodal understanding and generation capabilities [48]