GPT-5数字母依然翻车！马库斯：泛化问题仍未解决，Scaling无法实现AGI

Core Viewpoint - The article discusses the limitations and bugs of GPT-5, particularly its inability to accurately count letters in words, highlighting a specific incident involving the word "blueberry" [2][20][39]. Group 1: GPT-5's Counting Errors - A Duke University professor, Kieran Healy, tested GPT-5 by asking it to count the number of 'b's in "blueberry," to which GPT-5 incorrectly responded with three [2][4]. - Despite multiple attempts to clarify and correct GPT-5's counting, including asking it to spell out the 'b's, the model remained adamant about its incorrect count [8][9][11]. - Eventually, after persistent efforts from users, GPT-5 acknowledged the correct count but claimed the error was due to misinterpreting the word [15]. Group 2: General Bugs and Limitations - Gary Marcus, a notable critic, compiled various bugs found in GPT-5, including failures in basic principles like Bernoulli's principle and chess rules [20][23]. - The model also struggled with reading comprehension, misidentifying images with altered characteristics, such as a zebra with five legs [26][28]. - Marcus argues that the underlying issues with GPT-5 are indicative of broader problems in large models, particularly their inability to generalize effectively, which he attributes to long-standing issues like distribution drift [38][39][41].