Game Arena

Search documents
诺奖得主谈「AGI试金石」:AI自创游戏并相互教学
3 6 Ke· 2025-08-19 00:00
Core Insights - The interview with Demis Hassabis, CEO of Google DeepMind, discusses the evolution of AI technology and its future trends, particularly focusing on the development of general artificial intelligence (AGI) and the significance of world models like Genie 3 [2][3]. Group 1: Genie 3 and World Models - Genie 3 is a product of multiple research branches at DeepMind, aimed at creating a "world model" that helps AI understand the physical world, including physical structures, material properties, fluid dynamics, and biological behaviors [3]. - The development of AI has transitioned from specialized intelligence to more comprehensive models, with a focus on understanding the physical world as a foundation for AGI [3][4]. - Genie 3 can generate consistent virtual environments, maintaining the state of the scene when users return, which demonstrates its understanding of the world's operational logic [4]. Group 2: Game Arena and AGI Evaluation - Google DeepMind has partnered with Kaggle to launch Game Arena, a new testing platform designed to evaluate the progress of AGI by allowing models to play various games and test their capabilities [6]. - Game Arena provides a pure testing environment with objective performance metrics, allowing for automatic adjustment of game difficulty as AI capabilities improve [9]. - The platform aims to create a comprehensive assessment of AI's general capabilities across multiple domains, ultimately enabling AI systems to invent and teach new games to each other [9][10]. Group 3: Challenges in AGI Development - Current AI systems exhibit inconsistent performance, being capable in some areas while failing in simpler tasks, which poses a significant barrier to AGI development [7]. - There is a need for more challenging and diverse benchmarks that encompass understanding of the physical world, intuitive physics, and safety features [8]. - Demis emphasizes the importance of understanding human goals and translating them into useful reward functions for optimization in AGI systems [10]. Group 4: Future Directions in AI - The evolution of thinking models, such as Deep Think, represents a crucial direction for AI, focusing on reasoning, planning, and optimization through iterative processes [12]. - The transition from weight models to complete systems is highlighted, where modern AI can integrate tool usage, planning, and reasoning capabilities for more complex functionalities [13].
腾讯混元开源多个小尺寸模型;小米推出168雨伞;京东震虎价构成不正当竞争
Guan Cha Zhe Wang· 2025-08-05 01:21
Group 1 - Tencent's Mix Yuan released four open-source small-sized models with parameters of 0.5B, 1.8B, 4B, and 7B, which can run on consumer-grade graphics cards and are suitable for low-power scenarios like laptops and smart homes [1] - Gaode Map launched an AI-native map application called Gaode Map 2025, featuring an intelligent agent named "Xiao Gao Teacher" [1] - Google introduced a new LLM evaluation platform called Game Arena in collaboration with Kaggle, allowing LLMs to compete in strategic games for objective assessments [2] Group 2 - The Pudong New Area government in Shanghai issued a plan to support the application of AI large models in the financial sector, encouraging financial institutions to collaborate with fintech companies [2] - Apple reported strong Q3 results for FY2025, with total revenue of $94.04 billion, a 10% year-over-year increase, and net profit of $23.43 billion, a 9% increase, attributing part of the growth to government subsidies in China [3] - Xiaomi launched a customized umbrella priced at 169 yuan, designed specifically for car owners [3] Group 3 - Xiaoma Zhixing announced the launch of a Robotaxi service in Shanghai's Pudong area, operating from 7:30 AM to 9:30 PM on weekdays [4] - Faraday Future responded to accusations of plagiarism regarding its FX Super One vehicle, stating it was developed in collaboration with Chinese partners [5] - JD Auto responded to a legal dispute with Tuhu Auto regarding unfair competition, with a court ruling against JD Auto for misleading advertising practices [6][7] Group 4 - Apple has quietly formed a new internal team named "Answers, Knowledge and Information" to develop a search product aimed at enhancing the response experience of Siri and other Apple services [8] - China Shipbuilding Industry Corporation announced plans to merge with China Shipbuilding Heavy Industry Company, with the merger approved by the China Securities Regulatory Commission [8]
X @Demis Hassabis
Demis Hassabis· 2025-08-04 18:26
Thrilled to announce the @Kaggle Game Arena, a new leaderboard testing how modern LLMs perform on games (spoiler: not very well atm!). AI systems play each other, making it an objective & evergreen benchmark that will scale in difficulty as they improve.https://t.co/0e2dF2pbtX ...