Workflow
阿里公测千问对标ChatGPT,但9.9和9.11谁大还是“翻车”了
BABABABA(US:BABA) Di Yi Cai Jing·2025-11-17 08:31

Core Insights - The article discusses the performance of various AI models, particularly focusing on Alibaba's Qwen model, in answering a simple mathematical question about the comparison between 9.9 and 9.11, highlighting the challenges AI faces in common-sense reasoning [1][9][10]. Group 1: AI Model Performance - Alibaba's Qwen model initially answered incorrectly that 9.11 is greater than 9.9, but later corrected itself after a breakdown of the reasoning process [1][9]. - Other prominent AI models, including ChatGPT-4o and Google's Gemini Advanced, also failed to answer the question correctly, indicating a broader issue in AI's handling of basic arithmetic and common-sense reasoning [10][11]. Group 2: Self-Correction and Learning - The Qwen model demonstrated self-correction capabilities by analyzing its initial mistake and providing the correct answer upon further questioning [9][10]. - The initial error was attributed to a mismatch between the reasoning process and the final conclusion, as well as cognitive biases related to the numerical representation of 9.11 [9]. Group 3: Market Position and Strategy - Alibaba is positioning the Qwen model as a competitive alternative to ChatGPT in the global market, with plans to integrate it into various consumer applications such as maps, food delivery, and shopping [11]. - The Qwen series has achieved significant global traction, with over 600 million downloads, showcasing its growing influence and competitiveness in the AI landscape [10][11].