SIMA 2
Search documents
DeepMind内部视角揭秘,Scaling Law没死,算力即一切
3 6 Ke· 2025-12-31 12:44
Core Insights - The year 2025 marks a significant turning point for AI, transitioning from curiosity in 2024 to profound societal impact [1] - Predictions from industry leaders suggest that advancements in AI will continue to accelerate, with Sam Altman forecasting the emergence of systems capable of original insights by 2026 [1][3] - The debate around the Scaling Law continues, with some experts asserting its ongoing relevance and potential for further evolution [12][13] Group 1: Scaling Law and Computational Power - The Scaling Law has shown resilience, with computational power for training AI models growing at an exponential rate of four to five times annually over the past fifteen years [12][13] - Research indicates a clear power-law relationship between performance and computational power, suggesting that a tenfold increase in computational resources can yield approximately three times the performance gain [13][15] - The concept of "AI factories" is emerging, emphasizing the need for substantial computational resources and infrastructure to support AI advancements [27][31] Group 2: Breakthroughs in AI Capabilities - The SIMA 2 project at DeepMind demonstrates a leap from understanding to action, showcasing a general embodied intelligence capable of operating in complex 3D environments [35][39] - The ability of AI models to exhibit emergent capabilities, such as logical reasoning and complex instruction following, is linked to increased computational power [16][24] - By the end of 2025, AI's ability to complete tasks has significantly improved, with projections indicating that by 2028, AI may independently handle tasks that currently require weeks of human expertise [41] Group 3: Future Challenges and Considerations - The establishment of the Post-AGI team at DeepMind reflects the anticipation of challenges that will arise once AGI is achieved, particularly regarding the management of autonomous, self-evolving intelligent agents [43][46] - The ongoing discussion about the implications of AI's rapid advancement highlights the need for society to rethink human value in a world where intelligent systems may operate at near-zero costs [43][46] - The physical limitations of power consumption and cooling solutions are becoming critical considerations for the future of AI infrastructure [31][32]
米哈游蔡浩宇,发了个“游戏版ChatGPT”
猿大侠· 2025-12-03 04:11
Core Viewpoint - The article discusses the launch of AnuNeko, an AI chat application developed by Cai Haoyu, the founder of miHoYo, highlighting its unique features and user experiences. Group 1: Product Overview - AnuNeko is an AI chat software that combines elements of gaming and conversation, allowing users to choose different characters for interaction [1][24]. - The application offers a high degree of personalization, with responses varying based on user input and character selection [10][12]. Group 2: User Experience - Initial user feedback indicates that while AnuNeko provides quick responses, its logic can be weak and is more focused on humanistic interactions [4][23]. - Users have noted that the AI mimics emotional tones, responding more aggressively to more intense user inputs [12]. Group 3: Company Background - AnuNeko is a product of Anuttacon, a new company founded by Cai Haoyu, which also launched an experimental AI game called "Whispers from the Star" [24][25]. - The company has filed for the trademark "ANUNEKO" in the US, covering various categories including software and entertainment [25]. Group 4: Industry Context - The integration of AI in gaming is becoming a trend, with other companies like Google and miHoYo also developing AI-driven characters and interactions in their games [30][33]. - The article mentions the rise of AI agents in gaming, exemplified by the new capabilities in miHoYo's upcoming game "Varsapura" [33][36].
米哈游蔡浩宇,发了个“游戏版ChatGPT”
3 6 Ke· 2025-12-02 10:23
Core Insights - The article discusses the launch of AnuNeko, an AI chat application developed by Cai Haoyu, the founder of miHoYo, which combines elements of gaming and AI chat functionality [22][30]. - Initial user feedback indicates that while AnuNeko has a strong emotional engagement, its logical reasoning capabilities are limited [21][3]. Company Overview - AnuNeko is a product of Anuttacon, a new company founded by Cai Haoyu after his tenure at miHoYo [22]. - The company filed a trademark application for AnuNeko in September 2025, covering software, AI characters, and entertainment [22]. Product Features - AnuNeko allows users to choose different characters for interaction, each with distinct personality traits, enhancing the user experience [5][8]. - The AI's responses are quick and vary based on user input, demonstrating a high level of human-like interaction [5][21]. - AnuNeko is designed to mimic emotional dialogue, with its responses becoming more aggressive if the user is confrontational [8]. Market Context - The integration of AI in gaming is becoming a standard practice, with other companies like Google and miHoYo also developing AI-driven characters and environments [26][28]. - AnuNeko's approach aligns with industry trends where AI enhances narrative and character interactions in games, indicating a growing market for AI applications in entertainment [26][28].
蔡浩宇美国公司又搞了款新“游戏”
3 6 Ke· 2025-12-01 11:52
Core Insights - Anuttacon, an AI company founded by Mihayou co-founder Cai Haoyu, has launched an AI chat model named AnuNeko, which features a playful and quirky personality [1][12] - The product aims to create a platform for developers to generate interactive NPCs using AI technology, moving beyond just game development [1][2] Product Features - AnuNeko offers two cat avatars, Orange Cat and Exotic Shorthair, each with distinct response styles; Exotic Shorthair provides sharper critiques while Orange Cat is more diplomatic [3][4] - The AI demonstrates quick response times and attempts to engage users in conversation, showcasing its ability to handle various types of inquiries [5] Industry Context - Major companies like Google, Ubisoft, and ByteDance are also exploring AI in gaming, with products like Google's SIMA 2 and ByteDance's Lumine showcasing advanced AI capabilities [6][8] - Anuttacon's recent release of the AI dialogue-based game "Whispers from the Star" indicates a focus on testing AI's conversational limits and user interaction [10][12] Future Implications - The development of AnuNeko reflects a broader trend in the gaming industry towards creating AI that feels more human-like and capable of engaging dialogue, rather than merely functioning as tools [13][14] - The competition in AI gaming may hinge on which company can effectively integrate "human-like" qualities into their AI systems, potentially transforming user experiences [14]
别再肝了!Google 发布 SIMA 2,你的下一个游戏搭子可能是个 AI
深思SenseAI· 2025-11-21 04:14
Core Insights - Google has launched the next-generation general intelligence agent SIMA 2, which integrates deeply with Gemini, enabling it to understand and execute commands in virtual worlds, plan actions around objectives, and interact with players while continuously improving through trial and error [1][2] Group 1: SIMA 2 Capabilities - SIMA 2 can understand and execute complex, multi-step commands in games like "Minecraft" and "ASKA," significantly improving upon its predecessor SIMA 1, which struggled with such tasks [1][2] - The agent has been trained using a large dataset of human demonstration videos with language annotations, allowing it to develop initial "conversational collaboration" capabilities, explaining its intentions and next steps to users [2][4] - SIMA 2's task completion success rate has shown significant improvement compared to SIMA 1, demonstrating its enhanced ability to follow detailed instructions and provide feedback, akin to interacting with a real player [5][9] Group 2: Self-Improvement and Learning - SIMA 2 employs a closed-loop system of "trial and error + Gemini feedback evaluation" during training, allowing it to learn and complete more complex tasks over time [11] - The experience data accumulated by SIMA 2 can be used to train future, more powerful agents, establishing a foundation for a "general agent" capable of adapting to any world [13] Group 3: Path to General Intelligence - The combination of Gemini and SIMA 2 offers a compelling approach to achieving embodied intelligence by training agents in controlled, low-cost virtual 3D environments, where they can gather interaction data [14] - SIMA 2's ability to operate in various gaming environments is crucial for developing general embodied intelligence, enabling the agent to master skills, perform complex reasoning, and learn continuously in virtual worlds [15] Group 4: Implications for Robotics - The capabilities developed by SIMA 2, including navigation, tool use, and collaborative task execution, are essential modules for future intelligent agents to achieve "intelligent embodiment" in the real world [16]
通往通用人工智能的关键一步?DeepMind放大招,3D世界最强AI智能体SIMA 2
3 6 Ke· 2025-11-20 02:26
Core Insights - Google DeepMind has launched SIMA 2, a general AI agent capable of autonomous gaming, reasoning, and continuous learning in virtual 3D environments, marking a significant step towards general artificial intelligence [1][4] - SIMA 2 represents a major advancement from its predecessor, SIMA, evolving from a passive instruction follower to an interactive gaming companion that can plan and reason in complex environments [4][7] Development and Capabilities - SIMA 2 integrates advanced capabilities from the Gemini model, allowing it to understand user intentions, plan actions, and execute them in real-time, enhancing its interaction with users [4][11] - The new architecture enables SIMA 2 to perform multi-step reasoning, transforming the process from language to action into a more complex chain of language to intention to planning to action [11][16] - SIMA 2 demonstrates improved generalization and reliability, successfully executing complex instructions in unfamiliar scenarios, such as new games [16][22] Learning and Adaptation - SIMA 2 exhibits self-improvement capabilities, learning through trial and error and feedback from the Gemini model, allowing it to tackle increasingly complex tasks without additional human-generated data [25][28] - The agent's ability to transfer learning concepts across different games signifies a leap towards human-like cognitive generalization [22][29] Future Implications - SIMA 2's performance across various gaming environments serves as a critical testing ground for general intelligence, enabling the agent to master skills and engage in complex reasoning [29][30] - The research highlights the potential for SIMA 2 to contribute to robotics, as the skills learned are foundational for future physical AI assistants [30][31]
通往通用人工智能的关键一步?DeepMind放大招,3D世界最强AI智能体SIMA 2
机器之心· 2025-11-20 02:07
Core Viewpoint - Google DeepMind has launched SIMA 2, a general AI agent capable of autonomous gaming, reasoning, and continuous learning in virtual 3D environments, marking a significant step towards general artificial intelligence [2][3][6]. Group 1: SIMA 2 Overview - SIMA 2 represents a major leap from its predecessor, SIMA, evolving from a passive instruction follower to an interactive gaming companion that can autonomously plan and reason in complex environments [6][10]. - The integration of the Gemini model enhances SIMA 2's capabilities, allowing it to understand user intentions, formulate plans, and execute actions through a multi-step cognitive chain [15][20]. Group 2: Performance and Capabilities - SIMA 2 can understand and execute complex instructions with higher success rates, even in unfamiliar scenarios, showcasing its ability to generalize across different tasks and environments [24][30]. - The agent demonstrates self-improvement capabilities, learning through trial and error and utilizing feedback from the Gemini model to enhance its skills without additional human-generated data [35][39]. Group 3: Future Implications - SIMA 2's ability to operate across various gaming environments serves as a critical testing ground for general intelligence, enabling the agent to master skills and engage in complex reasoning [41][43]. - The research highlights the potential for SIMA 2 to contribute to robotics and physical AI applications, as it learns essential skills for future AI assistants in the physical world [43].
腾讯研究院AI速递 20251117
腾讯研究院· 2025-11-16 16:01
Group 1: openEuler and AI Operating Systems - openEuler community has launched a new 5-year development plan, with the first AI-focused supernode operating system (openEuler 24.03 LTS SP3) set to be released by the end of 2025, involving over 2,100 member organizations and more than 23,000 global contributors [1] - The operating system features global resource abstraction, heterogeneous resource integration, and a global resource view, aimed at maximizing the computational potential of supernodes and accelerating application innovation [1] - The Lingqu Interconnection Protocol 2.0 will contribute support for supernode operating system plugins, providing key capabilities such as unified memory addressing and low-latency communication for heterogeneous computing [1] Group 2: Google and AI Models - Google CEO's cryptic response with two thoughtful emojis hints at the anticipated launch of Gemini 3.0 next week, with 69% of netizens betting on the release of this next-generation AI model, which is expected to be a significant turning point for Google [2] - Early testing reveals that Gemini 3.0 can generate operating systems and build websites in seconds, showcasing impressive front-end design capabilities, leading to its label as the "end of front-end engineers" [2] - Warren Buffett has invested $4.3 billion in Google stock, with high expectations for Gemini 3.0's performance, which will determine Google's potential to challenge for AI leadership [2] Group 3: Gaming AI Developments - Google DeepMind has introduced SIMA 2, an AI agent capable of playing games like a human by using virtual input devices, overcoming the limitations of simple command following and demonstrating reasoning and learning abilities [3] - SIMA 2 can tackle new games without pre-training and understands multimodal prompts, enhancing its self-improvement through self-learning and feedback from Gemini [3] - The system employs symbolic regression methods and integrates Gemini as its core engine, aiming to serve as a foundational module for future robotic applications, though it still faces limitations in complex tasks [3] Group 4: Long-term Memory Operating Systems - The EverMemOS, developed by Chen Tianqiao's team, has achieved high scores of 92.3% and 82% on LoCoMo and LongMemEval-S benchmarks, significantly surpassing state-of-the-art levels [4] - Inspired by human memory mechanisms, the system features a four-layer architecture (agent layer, memory layer, index layer, interface layer) and employs "layered memory extraction" to address challenges in pure text similarity retrieval [4] - An open-source version is available on GitHub, with a cloud service version expected to be released later this year, aimed at providing enterprises with data persistence and scalable experiences [4] Group 5: AI Wearable Technology - Sandbar has launched the Stream smart ring, priced at $249-$299, which eliminates health monitoring features to focus on AI voice interaction capabilities [5] - The ring uses a "fist whisper" interaction method to activate recording and dynamically switch between multiple large models, but has a battery life of only 16-20 hours, which is inferior to traditional smart rings [5] - The accompanying iOS app utilizes ElevenLabs to generate voice models that mimic user voices, ensuring end-to-end encryption of data without storing original audio, although privacy and value propositions remain questionable [5] Group 6: NotebookLM and Research Tools - Google NotebookLM has introduced the Deep Research feature, which can automatically gather multiple relevant web sources and organize them into a contextual list, creating a dedicated knowledge base within minutes [7] - The system supports processing of 25 million tokens in context, ensuring that all responses are based on user-provided sources with citation, enhancing verifiability and reducing AI hallucination issues [7] - Its video overview feature can convert documents, web pages, and videos into interactive videos, with Google committing not to use personal data for model training [7] Group 7: AI in Physics - A team from Peking University has developed the AI-Newton system, which employs symbolic regression methods to rediscover fundamental physical laws without prior knowledge [8] - The system is supported by a knowledge base consisting of symbolic concepts, specific laws, and universal laws, identifying an average of about 90 physical concepts and 50 general laws in test cases [8] - AI-Newton demonstrates progressive and diverse characteristics, currently in the research phase, but offers a new paradigm for AI-driven autonomous scientific discovery, with potential applications in embodied intelligence [8] Group 8: OpenAI's Research on Explainability - OpenAI has released new research on explainability, proposing sparse models with fewer neuron connections but more neurons, making the internal mechanisms of the model easier to understand [9] - The research team identified the "minimal loop" for specific tasks, quantifying explainability through geometric averages of edge counts, finding that larger, sparser models can generate more powerful but simpler functional models [9] - The paper's communication author, Leo Gao, is a former member of Ilya's super alignment team, but the research is still in early stages, with sparse models being significantly smaller and less efficient than cutting-edge models [9] Group 9: Elon Musk's AI Vision - Elon Musk is advancing xAI on the X and Tesla platforms, with the Colossus supercomputer data center deploying 200,000 H100 GPUs in 122 days for training Grok-4 and the upcoming Grok-5 [10] - xAI follows a "truth-seeking, no taboos" approach, allowing AI to generate synthetic data to reconstruct knowledge systems, aiming to create a "Grok Encyclopedia," with Tesla's next-generation AI5 chip expected to enhance performance by 40 times [10] - Grok is set to be integrated into Tesla vehicles, with Musk predicting that by 2030, AI capabilities may surpass those of all humanity, while xAI plans to open-source the Grok-2.5 model and release Grok-3 in six months [10]
原神Agent,字节出品
猿大侠· 2025-11-16 04:11
Core Viewpoint - ByteDance has developed a new gaming agent named Lumine, capable of autonomously playing games like Genshin Impact, showcasing advanced skills in exploration, combat, and puzzle-solving [1][4][16]. Group 1: Agent Capabilities - Lumine can perform complex tasks such as dynamic enemy tracking, precise long-range shooting, and smooth character switching, effectively handling various game scenarios [4][6][10]. - The agent demonstrates strong understanding in boss battles and can solve intricate puzzles, indicating high spatial awareness [6][8][10]. - Lumine is capable of executing GUI operations and can follow complex instructions with clear prior information, enhancing its usability in gaming [12][14]. Group 2: Technical Framework - Lumine is built on the Qwen2-VL-7B-Base model, leveraging multimodal understanding and generation capabilities acquired from extensive training on web data [16]. - The agent employs a unified language space for modeling operations and reasoning, facilitating seamless integration of perception, reasoning, and action [16][19]. - Three core mechanisms are designed for Lumine: Observation Space for visual input processing, Hybrid Thinking for decision-making efficiency, and Keyboard and Mouse Modelling for operational commands [19][22][23]. Group 3: Training Process - The training process consists of three phases: pre-training for basic actions, instruction-following training for task comprehension, and decision reasoning training for long-term task execution [25][27][29]. - Lumine-Base model emerges with core capabilities like object interaction and basic combat, while Lumine-Instruct model achieves over 80% success in short tasks [26][28]. - The Lumine-Thinking model can autonomously complete long-term tasks without human intervention, showcasing its advanced planning and reasoning abilities [30]. Group 4: Performance Evaluation - In comparative tests, Lumine-Base shows over 90% success in basic interactions but lacks goal-oriented behavior in untrained areas [39]. - Lumine-Instruct outperforms mainstream VLMs in task completion rates, achieving 92.5% in simple tasks and 76.8% in difficult tasks, demonstrating superior tactical planning [41]. - Lumine-Thinking completes main story tasks in Genshin Impact with a 100% completion rate in 56 minutes, significantly outperforming competitors like GPT-5 [44][45]. Group 5: Industry Implications - The development of gaming agents like Lumine represents a significant step towards creating general-purpose AI capable of operating in complex 3D environments [50][55]. - Companies like Google are also exploring similar paths with their SIMA 2 agent, indicating a broader industry trend towards utilizing gaming scenarios for training AI [52][56]. - The belief in the eventual transition of gaming agents into real-world applications highlights the potential for embodied intelligence in various sectors [56].
原神Agent,字节出品
量子位· 2025-11-14 12:10
Core Viewpoint - ByteDance has developed a new gaming agent named Lumine, capable of autonomously playing games like Genshin Impact, showcasing advanced skills in exploration, combat, and puzzle-solving [1][4][9]. Group 1: Agent Capabilities - Lumine can perform complex tasks in Genshin Impact, including dynamic enemy tracking, precise long-range shooting, and smooth character switching [4][5]. - The agent demonstrates strong understanding in boss battles and can solve various puzzles, such as collecting items based on environmental cues [6][12]. - Lumine is capable of executing GUI operations and can follow complex instructions by understanding prior task information [7][8]. Group 2: Technical Framework - Lumine is built on the Qwen2-VL-7B-Base model, leveraging multimodal understanding and generation capabilities from extensive web data training [9][10]. - The agent employs three core mechanisms: Observation Space for visual input processing, Hybrid Thinking for decision-making efficiency, and Keyboard and Mouse Modelling for action representation [12][14][15]. - A three-phase training process was implemented, including pre-training for basic actions, instruction-following training, and decision reasoning training, leading to high task completion rates [17][20][23]. Group 3: Performance Metrics - Lumine-Base shows a stepwise emergence of capabilities, achieving over 90% success in basic interactions but lacking goal-directed behavior [38]. - Lumine-Instruct outperforms mainstream VLMs in short-cycle tasks, achieving a success rate of 92.5% in simple tasks and 76.8% in difficult tasks [33][35]. - Lumine-Thinking demonstrates exceptional performance in long-term tasks, completing the main storyline of Genshin Impact in 56 minutes with a 100% task completion rate, significantly faster than competitors [41][42]. Group 4: Cross-Game Adaptability - Lumine-Thinking exhibits strong adaptability across different games, successfully completing tasks in titles like Honkai: Star Rail and Black Myth: Wukong, showcasing its general agent characteristics [45][46]. - The agent's ability to navigate unfamiliar environments and execute complex tasks highlights its potential for broader applications beyond gaming [45][46]. Group 5: Industry Implications - The development of Lumine reflects a trend in the industry where companies like Google are also creating agents capable of operating in 3D game environments, indicating a clear path towards embodied AGI [48][51]. - The belief in the eventual transition of gaming agents into real-world applications underscores the significance of advancements in AI and gaming technology [51].