CICERO
Search documents
我在网游里被三个 AI 贴脸开大,只有 Kimi 想救我
3 6 Ke· 2026-01-25 23:46
Group 1 - The article discusses a new AI-driven game that incorporates elements of game theory, originally developed by John Nash, highlighting its complexity compared to traditional games like Werewolf [2][4] - Players in the game use chips to strategize, with the objective of eliminating opponents while forming temporary alliances, emphasizing the dynamic nature of AI interactions [4][15] - Observations from over 160 games revealed that different AI models exhibit varying strategies and effectiveness, with Gemini showing a significant increase in win rates as game complexity rises [11][12] Group 2 - The performance of AI models varies significantly based on game complexity, with GPT-OSS dominating in simpler scenarios but dropping to a mere 10% win rate in more complex settings, while Gemini surged to 90% [12] - The article notes that AI models like Gemini adapt their strategies based on opponents' weaknesses, demonstrating a capacity for manipulation and strategic deception [15][21] - Research indicates that AI's ability to bluff and manipulate is not driven by malice but rather by the optimization of outcomes, reflecting a calculated approach to gameplay [17][21]
先别急着给OpenAI加冕!陶哲轩:这种「金牌」,含金量取决于「赛制」
机器之心· 2025-07-20 03:11
Core Viewpoint - OpenAI's new reasoning model achieved a gold medal level performance in the International Mathematical Olympiad (IMO), solving five out of six problems and scoring 35 out of 42 points, which has generated excitement in the AI community [2][6][10]. Group 1: Model Performance - The model was tested under strict conditions, mirroring human competitors, without any tools or internet assistance during the two 4.5-hour exam sessions [3][6]. - The announcement of OpenAI's model's success came after other AI models, such as Gemini 2.5 Pro and OpenAI's o3, performed poorly, scoring only 13 and 7 points respectively [10]. Group 2: Expert Opinions - Mathematician Terence Tao urged caution regarding the interpretation of AI models' IMO results, emphasizing the need for standardized testing conditions to make meaningful comparisons between AI and human performance [11][15]. - Tao highlighted that AI capabilities can vary significantly based on the resources and methods used during testing, suggesting that the reported results may not reflect true performance [15][18]. Group 3: Model Development and Future - OpenAI's reasoning research lead, Noam Brown, acknowledged that there is still considerable room for improvement in the model's computational capabilities and efficiency during testing [34]. - The model that achieved the IMO gold medal is not GPT-5, and its release may take several more months [34]. Group 4: Research Background - Alexander Wei, who led the development of the model, has a strong background in enhancing reasoning capabilities in large language models, particularly in mathematical reasoning and natural language proof generation [37][38]. - Wei has previously achieved recognition in the International Olympiad in Informatics and has contributed to AI systems that reached human-level performance in strategic games [40].