之江实验室薛贵荣:当AI开始做科研,我看到了大语言模型的天花板丨GAIR 2025
雷峰网·2025-12-24 00:22

Core Viewpoint - The GAIR conference highlights the evolution of AI technology and its transition from laboratory research to industrial applications, emphasizing the importance of scientific foundational models to overcome the limitations of large language models in understanding complex scientific data [2][4]. Group 1: Limitations of Large Language Models - Large language models are constrained by "language boundaries," making it difficult for them to comprehend high-dimensional, multi-modal scientific data and to independently achieve verifiable scientific discoveries [4][22]. - In a challenging HLE test covering over 100 disciplines, the best-performing model achieved only a 25.4% accuracy rate, indicating significant limitations in addressing scientific problems [4][18]. - The primary difference between large language models and scientific foundational models lies in their data representation; the latter utilizes cross-disciplinary, multi-type scientific data as tokens, rather than solely text [4][26]. Group 2: Scientific Foundational Models - The 021 scientific foundational model developed by Zhijiang Laboratory aims to break through language limitations and unify scientific data for enhanced reasoning and discovery across disciplines [4][5]. - Tokenizing scientific data effectively is crucial for establishing connections between different types of data, enabling comprehensive analysis of scientific problems across various fields [5][28]. - The model supports applications in 19 key disciplines, covering 174 areas of scientific knowledge, and aims to streamline processes that traditionally require extensive time and resources [31][36]. Group 3: Collaborative Efforts and Future Directions - The initiative involves collaboration with national laboratories, universities, and enterprises to co-create and enhance the model, fostering a deeper understanding of key scientific data and challenges [36][38]. - An open research platform, zero2x, is being developed to facilitate access to data and models, encouraging broader participation in scientific discovery and innovation [38]. - The goal is to transform scientific research paradigms and accelerate the integration of AI into scientific endeavors, ultimately leading to significant advancements in the field [38].