Workflow
基于部首和象形分析的可解释甲骨文破译框架
icon
Search documents
大模型破译甲骨文创下新SOTA!复旦团队推出新框架
量子位· 2025-09-07 04:36
Core Viewpoint - The article presents a novel explainable framework for deciphering oracle bone script based on radical and pictographic analysis, achieving state-of-the-art (SOTA) accuracy in character recognition and demonstrating strong zero-shot capabilities [1][5][71]. Group 1: Methodology and Framework - The proposed method integrates radical recognition and pictographic semantic understanding to bridge the gap between the visual forms and meanings of oracle bone characters [5][71]. - A progressive training strategy is introduced, guiding the model from radical identification to pictographic analysis, culminating in a joint analysis to enhance the deciphering process [6][15][22]. - The framework employs a dual matching mechanism that selects appropriate candidates from a dictionary based on analysis results, improving zero-shot performance [28][71]. Group 2: Dataset and Training - The research team developed the PD-OBS dataset, which includes 47,157 Chinese characters annotated with oracle bone images and pictographic analysis texts, providing a valuable resource for future studies [9][73]. - The dataset comprises characters linked to oracle bone images, ancient script images, and modern standard script images, with annotations for radical and pictographic analysis [10][73]. Group 3: Experimental Results - The new method was evaluated against existing methods on the HUST-OBC and EV-OBC datasets, showing competitive Top-1 and Top-10 accuracy rates, particularly excelling in zero-shot scenarios [38][45]. - In zero-shot settings, the proposed method outperformed all other methods, achieving a Top-10 accuracy improvement of 26.2% on the HUST-OBC dataset and 13.6% on the EV-OBC dataset [45][46]. - The explainability of the model's outputs was quantitatively assessed using BERT-Score, demonstrating higher reliability compared to other large visual language models [47][50]. Group 4: Qualitative Analysis - The model exhibited strong recognition capabilities in both validation and zero-shot settings, generating semantically reasonable predictions for characters that have not been deciphered by human experts [66][68]. - The dual analysis of radicals and pictographs provided a comprehensive visual-semantic mapping, enhancing the model's ability to produce interpretable outputs even for undeciphered characters [68][70].