Workflow
AutoCoT
icon
Search documents
国产大模型高考出分了:裸分683,选清华还是北大?
量子位· 2025-06-26 06:25
Core Insights - The article discusses the performance of various AI models in a simulated high school examination, comparing their scores and capabilities in different subjects [2][12]. Group 1: Overall Performance - Gemini achieved the highest score in science with 655 points, while Doubao scored 683 points in humanities, also ranking first [2]. - Doubao excelled in six subjects, maintaining top scores except in mathematics, chemistry, and biology [3][4]. Group 2: Subject-Specific Analysis - In the subject breakdown, Doubao scored 128 in Chinese, 141 in mathematics, and 144 in English, while Gemini scored 126 in Chinese and 140 in mathematics [3]. - The models showed significant improvement in mathematics compared to previous years, with most scoring around 140 points [13]. - Doubao and Gemini demonstrated better performance in visual comprehension tasks compared to other models, particularly in chemistry [22][42]. Group 3: Evaluation Methodology - The evaluation used a combination of national and provincial exam papers, with a total score of 750 points [9]. - Scoring was conducted through a mix of automated assessments and human evaluations, ensuring a fair testing environment [10][11]. Group 4: Model Development and Improvement - Doubao's advancements are attributed to three key strategies: multi-modal integration, enhanced reasoning capabilities, and dynamic thinking abilities [30][33][40]. - The model's training involved a three-phase process focusing on text, multi-modal data, and long-context support, significantly improving its performance in reading comprehension and reasoning tasks [35][36]. Group 5: Future Directions - The article suggests that combining text and image inputs can significantly enhance model performance, indicating a promising area for future exploration [42][43].