大模型破译甲骨文创下新SOTA！复旦团队推出新框架

Core Viewpoint - The article discusses a novel explainable framework for deciphering oracle bone script based on radical and pictographic analysis, achieving state-of-the-art (SOTA) accuracy in character recognition and zero-shot decoding capabilities [1][5][71]. Group 1: Framework and Methodology - The proposed method integrates radical recognition and pictographic semantic understanding to bridge the gap between the visual forms and meanings of oracle bone characters [5][71]. - A progressive training strategy is introduced, guiding the model from radical identification to pictographic analysis, culminating in a joint analysis phase [6][15]. - The framework employs a dual matching mechanism that enhances zero-shot decoding performance by selecting appropriate candidates from a dictionary based on analysis results [28][71]. Group 2: Dataset and Training - The research team created the PD-OBS dataset, which includes 47,157 Chinese characters annotated with oracle bone images and pictographic analysis texts, providing a valuable resource for future studies [9][73]. - The dataset comprises characters linked to oracle bone images, ancient script images, and modern script images, with annotations for radical and pictographic analysis [10][73]. Group 3: Experimental Results - The proposed method was evaluated against existing methods on the HUST-OBC and EV-OBC datasets, demonstrating superior performance in both validation and zero-shot settings [36][38]. - In zero-shot scenarios, the new method outperformed all other approaches, achieving a Top-10 accuracy improvement of 26.2% on the HUST-OBC dataset and 13.6% on the EV-OBC dataset [45][46]. - The explainability of the model's outputs was quantitatively assessed using BERT-Score, showing significant improvements over other large visual language models [47][49]. Group 4: Qualitative Analysis - The model exhibited strong recognition capabilities in the validation set and demonstrated good generalization in zero-shot settings, even for previously undeciphered characters [66][68]. - The dual analysis of radicals and pictographs provided a comprehensive visual-semantic mapping, enhancing the model's ability to generate semantically grounded and interpretable outputs [68][70].