AI学术会议评审
Search documents
居然有21%的ICLR 2026评审纯用AI生成…
量子位· 2025-11-30 06:45
Core Insights - A significant 21% of reviews for ICLR 2026 are suspected to be entirely AI-generated, highlighting a growing trend in AI involvement in academic peer review processes [1][21][26]. Group 1: Discovery and Analysis - The investigation began when CMU researcher Graham Neubig noticed an unusual AI-like quality in the peer reviews he received, prompting him to seek a systematic analysis of ICLR submissions and reviews [2][3]. - Pangram Labs conducted a comprehensive analysis of approximately 19,490 submitted papers and 75,800 reviews from ICLR 2026, revealing that 15,899 reviews (21%) were highly suspected to be AI-generated [8][9][21]. - The analysis utilized advanced OCR and text classification models to accurately assess the content of both submissions and reviews, ensuring minimal interference from formatting issues [11][12][13]. Group 2: AI Involvement in Submissions and Reviews - Over half of the reviews exhibited varying degrees of AI participation, while 61% of the papers were human-written, with 199 papers (1%) being entirely AI-generated [22][24]. - The study found that AI-generated content in papers correlated with lower average review scores, indicating that AI writing may not yet match the quality of human-authored work [34]. - Conversely, reviews with higher AI involvement tended to receive more favorable scores, suggesting a lenient bias in AI-generated reviews [38]. Group 3: Ethical Considerations and Guidelines - ICLR has established clear guidelines regarding the use of AI in submissions and reviews, emphasizing the need for disclosure and adherence to ethical standards [29][31]. - Authors can utilize AI to assist in writing but must acknowledge its use, while reviewers are discouraged from relying solely on AI for their evaluations due to confidentiality and authenticity concerns [32][31]. - The emergence of AI-generated reviews raises questions about the integrity of the peer review process and the importance of maintaining human judgment in academic evaluations [51].