医疗AI迎来大考,南洋理工发布首个LLM电子病历处理评测
3 6 Ke·2025-12-16 03:05

Core Insights - Researchers from Nanyang Technological University have developed the EHRStruct benchmark to evaluate the ability of large language models (LLMs) to process structured electronic health records (EHRs) [1][2] - The benchmark includes 11 core tasks organized by clinical scenarios, cognitive levels, and functional categories, comprising 2,200 samples [1][2] - Findings indicate that general-purpose models outperform medical-specific models, with data-driven tasks showing stronger performance [1][8] Benchmark Overview - EHRStruct is the first comprehensive benchmark for assessing LLMs' capabilities in handling structured EHRs, created collaboratively by computer scientists and medical experts [1][2] - The benchmark is structured into 11 tasks categorized into data-driven and knowledge-driven scenarios, covering understanding and reasoning levels [3][4] Task Categories - The tasks are divided into six typical categories: information retrieval, data aggregation, arithmetic computation, clinical identification, diagnostic assessment, and treatment planning [4][5] - Data-driven tasks include filtering, aggregation, and arithmetic reasoning, while knowledge-driven tasks focus on clinical code identification and predictive assessments [3][4] Evaluation Process - The evaluation process involves a systematic assessment of 20 LLMs, utilizing 200 question-answer samples for each task, with various input formats tested [11][10] - The benchmark supports in-depth experiments on specific models, including few-shot prompting and fine-tuning [11] Key Findings - General-purpose LLMs, particularly the Gemini series, demonstrate superior performance in structured EHR tasks compared to medical-specific models [14][8] - Data-driven tasks yield better results overall, while knowledge-driven tasks, especially diagnostic assessments, remain challenging for existing models [15][17] - The EHRMaster framework, when combined with Gemini, significantly enhances performance in both data-driven and knowledge-driven tasks [20][19] Future Directions - The EHRStruct 2026 challenge has been launched to provide a standardized platform for researchers to evaluate LLMs' capabilities in structured EHR processing [2] - Collaboration with international conferences is anticipated to facilitate the submission of research reports and papers based on the challenge [2]

医疗AI迎来大考,南洋理工发布首个LLM电子病历处理评测 - Reportify