Core Viewpoint - The article discusses the limitations of AI models in understanding distorted text, highlighting that while humans can easily comprehend such text, AI struggles significantly, indicating a fundamental gap in AI's text recognition capabilities [2][23]. Group 1: AI Performance in Text Recognition - A research team from various institutions conducted experiments showing that leading AI models like OpenAI's GPT-5 and Google's Gemini perform poorly when faced with distorted Chinese characters and English words [2][4]. - In tests involving 100 four-character Chinese idioms, AI models achieved a maximum average matching rate of only 12.1%, while humans scored 100% [7]. - For 100 eight-letter English words, AI models also failed to provide correct answers, demonstrating a consistent inability to recognize and interpret distorted text [10][20]. Group 2: Human vs. AI Understanding - Humans can easily read distorted text due to their advanced visual processing and understanding of language structure, while AI relies on pattern matching without grasping the underlying text structure [9][24]. - The inability of AI to separate and combine symbols leads to failures in recognizing even slightly altered text, which humans can still understand [26][28]. - The research emphasizes that human reading comprehension is a multi-modal process, relying on various sensory inputs and reasoning abilities, unlike AI's current capabilities [29]. Group 3: Implications for AI Development - The findings suggest that to enhance AI's resilience in text recognition, there is a need to rethink how vision-language models (VLMs) integrate visual and textual information, potentially requiring new training data and structural insights [28]. - The results also highlight the challenges AI faces in practical applications, such as education and security, where non-standard text recognition is crucial [30].
人类秒懂,AI崩溃:一个简单测试,就让GPT-5、Gemini等顶级模型集体“翻车”
量子位·2025-09-09 12:20