DeepMind
Search documents
OpenAI以为GPT-5搞出了数学大新闻,结果…哈萨比斯都觉得尴尬
量子位· 2025-10-20 01:16
Core Viewpoint - OpenAI's announcement of GPT-5 solving several Erdős mathematical problems was later revealed to be an exaggeration, as the AI merely retrieved existing solutions rather than independently solving the problems [5][13][14]. Group 1: Announcement and Initial Reactions - OpenAI researcher Mark Sellke claimed that GPT-5 had made significant breakthroughs in mathematics by solving 10 previously unsolved Erdős problems [5][7]. - The announcement led to widespread excitement, with many mistakenly believing that GPT-5 had independently cracked long-standing mathematical challenges [9]. - DeepMind CEO Demis Hassabis and Meta's Yann LeCun publicly criticized the claims, highlighting the embarrassment surrounding the situation [3][4][10][16]. Group 2: Clarification and Reality Check - Thomas Bloom, the creator of the website referenced by OpenAI, clarified that GPT-5 did not solve the problems but rather found existing solutions through online searches [12][13]. - The "unsolved" status on the website was due to Bloom's lack of awareness of the existing solutions, not because they had not been solved by the mathematical community [13][14]. - Following the backlash, researcher Sebastien Bubeck deleted his earlier tweet and acknowledged the misunderstanding, emphasizing the difficulty of literature retrieval [15]. Group 3: GPT-5's Capabilities and Context - Despite the controversy, GPT-5 has demonstrated notable mathematical abilities, such as solving complex problems and providing key proofs in a short time [18][19][22]. - Previous successes of GPT-5 in mathematics contributed to the inflated expectations surrounding its capabilities [17][22]. - The incident reflects a growing desensitization to AI advancements, suggesting that without genuine breakthroughs, exaggerated claims may lead to significant misinterpretations [27].
Andrej Karpathy:2025 不是 AI 爆发年,未来十年怎么走?
3 6 Ke· 2025-10-20 00:28
Core Insights - The AI industry is experiencing significant discussions about the "agent era" in 2025, with advancements such as DeepSeek surpassing GPT-4o and OpenAI releasing Agent SDK [1] - Andrej Karpathy, a former core researcher at OpenAI, argues that the notion of an "explosion year" for AGI is misleading, emphasizing that true AGI development will take decades and is a gradual process [2][4] - The current AI systems lack memory and continuity, functioning more like "ghosts" that do not retain user identity or past interactions [5][6][12] Group 1: Current AI Limitations - Current AI assistants do not possess basic memory capabilities, leading to a lack of continuity in interactions [5][7] - Karpathy defines a true agent as one that requires persistence over time, memory, and continuity, which current AI lacks [7][8] - Existing products like ChatGPT and Claude do not remember users; they only engage in real-time conversations without retaining context [9][10] Group 2: Future Directions for AI - Karpathy outlines three critical development paths for achieving true AGI: understanding user intent, operating in the real world, and maintaining continuity over time [16][21][25] - The first path focuses on enhancing AI's understanding of language and context, which is currently being pursued by models like GPT and Claude [17][20] - The second path emphasizes the need for AI to perform actions in the real world, moving beyond mere conversation to actively assist users [21][24] - The third path highlights the importance of creating AI that can exist as a long-term companion, integrating memory and task awareness [25][26] Group 3: Training Methodologies - Karpathy advocates for a shift in AI training from data overload to structured learning with clear objectives [28][32] - He proposes three principles for training AI: having a sense of purpose, focusing on actionable tasks, and incorporating feedback loops for continuous improvement [34][36][37] - This new approach aims to cultivate AI like a colleague rather than merely feeding it data, fostering a more effective learning environment [38][40] Group 4: AI's Role in Society - The future of AI is envisioned as entities with roles and responsibilities, rather than just tools for specific tasks [41][42] - As AI assumes roles, questions arise about accountability and certification, leading to the emergence of a new "role market" for AI [43] - Karpathy suggests that AI will not replace humans but will redefine roles, allowing for collaboration between humans and AI in various professional fields [45][46]
MuJoCo教程来啦!从0基础到强化学习,再到sim2real
具身智能之心· 2025-10-20 00:03
Core Insights - The article emphasizes that the field of AI is at a pivotal moment, transitioning from early symbolic reasoning to deep learning breakthroughs and now to the rise of embodied intelligence, which is redefining human-machine relationships [1][3]. Group 1: Embodied Intelligence - Embodied intelligence is characterized by machines that can understand language commands, navigate complex environments, and make intelligent decisions in real-time, moving beyond the realm of virtual space [1]. - Major tech companies like Tesla, Boston Dynamics, OpenAI, and Google are actively developing technologies in this disruptive field, indicating a competitive landscape [1][3]. - The potential impact of embodied intelligence spans across various industries, including manufacturing, healthcare, and space exploration, suggesting a transformative effect on the economy and society [1]. Group 2: Technical Challenges and Solutions - Achieving true embodied intelligence presents unprecedented technical challenges, requiring advancements in algorithms, physical simulation, robot control, and perception fusion [3]. - MuJoCo (Multi-Joint dynamics with Contact) is highlighted as a critical technology for embodied intelligence, serving as a high-fidelity simulation engine that connects virtual and real-world environments [4][6]. - MuJoCo allows researchers to conduct millions of trials in a simulated environment, significantly accelerating the learning process while minimizing risks associated with physical hardware [6][8]. Group 3: MuJoCo's Advantages - MuJoCo's advanced contact dynamics algorithms enable precise simulation of complex interactions between robots and their environments, making it a standard tool in both academia and industry [4][8]. - The engine supports high parallelization, allowing thousands of simulations to run simultaneously, which enhances efficiency in training AI systems [4][6]. - The technology's stability and numerical accuracy ensure reliable long-term simulations, making it a preferred choice for leading tech companies [4][6]. Group 4: Educational Initiatives - A comprehensive MuJoCo development tutorial has been created, focusing on practical applications and theoretical foundations within the context of embodied intelligence [9][11]. - The course is structured into six modules, each with specific learning objectives and practical projects, ensuring a thorough understanding of the technology stack [15][17]. - Participants will engage in hands-on projects that cover a range of applications, from basic robotic arm control to complex multi-agent systems, fostering both theoretical knowledge and practical skills [19][29]. Group 5: Target Audience and Outcomes - The course is designed for individuals with programming or algorithm backgrounds looking to enter the field of embodied robotics, as well as students and professionals seeking to enhance their practical capabilities [32][33]. - Upon completion, participants will possess a complete skill set in embodied intelligence, including proficiency in MuJoCo, reinforcement learning, and real-world application of simulation techniques [32][33]. - The program aims to cultivate a combination of technical, engineering, and innovative skills, preparing participants to tackle complex problems in the field [33].
OpenAl为何“情迷”变现
Hu Xiu· 2025-10-19 03:56
Core Points - Sam Altman announced on October 15 that OpenAI will introduce adult content in December, emphasizing a more comprehensive age verification process and treating adult users as adults [1][7] - OpenAI is not the only company entering the adult content space; Elon Musk's xAI has also launched a flirty AI companion, indicating a divergence in strategic approaches between the two companies [2] - Altman's strategy focuses on integrating various third-party applications into ChatGPT to create a "super app" that can handle a wide range of tasks, while Musk's xAI aims for deeper integration with the physical world through "world models" [3][4] Company Strategies - OpenAI is pursuing rapid commercialization to establish a foothold in the market, while Musk has publicly criticized OpenAI for its excessive commercialization [5] - OpenAI has faced user criticism regarding the human-like interaction experience of ChatGPT, leading to the reintroduction of GPT-4o after complaints about the new GPT-5 model [8][9] - In response to concerns about user safety, OpenAI established a "Welfare and AI" committee, although it has faced criticism for not including suicide prevention experts [10] Industry Context - The competition between OpenAI and xAI is not just a technical race but also involves differing philosophies and responsibilities regarding AI development [10] - The introduction of adult content by OpenAI reflects a broader trend in the industry where companies are exploring new revenue streams while navigating ethical considerations [1][5]
GPT-5 核心成员详解 RL:Pre-training 只有和 RL 结合才能走向 AGI
海外独角兽· 2025-10-18 12:03
Core Insights - The article discusses the limitations of current large language models (LLMs) and emphasizes the importance of reinforcement learning (RL) as a more viable path toward achieving artificial general intelligence (AGI) [2][3][50] - It highlights the interplay between pre-training and RL, suggesting that both are essential for the development of advanced AI systems [16][50] Group 1: Reinforcement Learning (RL) Insights - Richard Sutton argues that the current LLM approach, which primarily relies on imitation, has fundamental flaws and is a "dead end" for achieving AGI, while RL allows models to interact with their environment and learn from experience [2] - Andrej Karpathy points out that traditional RL is inefficient and that future intelligent systems will not rely solely on RL [2] - Jerry Tworek emphasizes that RL must be built on strong pre-training, and that the two processes are interdependent [3][16] Group 2: Reasoning and Thought Processes - The reasoning process in AI is likened to human thinking, where models must search for unknown answers rather than simply retrieving known ones [7][9] - The concept of "chain of thought" (CoT) is introduced, where language models express their reasoning steps in human language, enhancing their ability to solve complex problems [10][11] - The balance between output quality and response time is crucial, as longer reasoning times generally yield better results, but users prefer quicker responses [12][13] Group 3: Model Development and Iteration - The evolution of OpenAI's models is described as a series of scaling experiments aimed at improving reasoning capabilities, with each iteration building on the previous one [13][15] - The transition from the initial model (o1) to more advanced versions (o3 and GPT-5) reflects significant advancements in reasoning and tool usage [15][16] - The integration of RL with pre-training is seen as a necessary strategy for developing more capable AI systems [16][19] Group 4: Challenges and Future Directions - The complexity of RL is highlighted, with the need for careful management of rewards and penalties to train models effectively [20][33] - The potential for online RL, where models learn in real-time from user interactions, is discussed, though it poses risks that need to be managed [36][38] - The ongoing challenge of achieving alignment in AI, ensuring models understand right from wrong, is framed as a critical aspect of AI development [39][47]
谷歌 DeepMind 推出 CodeMender:自动修复代码的智能代理
AI前线· 2025-10-18 05:11
Core Insights - Google DeepMind has launched CodeMender, an AI-driven intelligent agent designed to automatically detect, fix, and strengthen software vulnerabilities, aiming to reduce the time developers spend on identifying and addressing security issues [1][4] - CodeMender combines automated vulnerability discovery with AI-based repair and validation, contributing 72 verified patches to open-source projects in the past six months, with some projects exceeding 4 million lines of code [1][2] Group 1 - Traditional vulnerability detection methods, such as static analysis and fuzzing, require significant manual verification and remediation, which CodeMender seeks to improve upon [1] - The system generates multiple repair candidates when a vulnerability is detected and validates these patches through automated testing to ensure they resolve the issue without introducing new errors [1][4] - Early repair cases include fixing a heap buffer overflow related to XML stack processing and addressing an object lifecycle management vulnerability [2] Group 2 - The community response to CodeMender has been largely positive, with comments highlighting the impressive nature of automated repairs and the importance of the verification layer for trust [3] - Discussions on platforms like Reddit indicate concerns about the future impact of such automation on cybersecurity, with users speculating on the potential for hackers to exploit similar models [4] - DeepMind emphasizes that all patches generated by CodeMender will undergo human review before formal integration, with reliability and transparency being core principles of the project [4]
新模型组团出道,多项机器人技术开源,近期AI新鲜事还有这些……
红杉汇· 2025-10-17 00:04
Group 1 - The emergence of large language models (LLMs) has significantly advanced the automation of scientific discovery, with AI Scientist systems leading the exploration [5][6] - Current AI Scientist systems often lack clear scientific goals, resulting in research outputs that may seem immature and lack true scientific value [5] - A new AI Scientist system, DeepScientist, has achieved research progress equivalent to three years of human effort in just two weeks, demonstrating its capability in various fields [6] Group 2 - OpenAI recently held a developer conference with around 1,500 attendees and over tens of thousands of online viewers, showcasing its achievements and new tools [8] - OpenAI's platform has attracted 4 million developers, with ChatGPT reaching 800 million weekly active users and processing nearly 6 billion tokens per minute [8] - New tools and models were introduced, including the Apps SDK and AgentKit, enhancing the capabilities of ChatGPT and facilitating rapid prototyping for developers [8] Group 3 - The latest version of the image generation model, Hunyuan Image 3.0, has topped the LMArena leaderboard, outperforming 26 other models [11][12] - Hunyuan Image 3.0 is the largest open-source image generation model with 80 billion parameters and 64 expert networks, showcasing advanced capabilities in knowledge reasoning and aesthetic performance [12] Group 4 - NVIDIA has open-sourced several key technologies at the Conference on Robot Learning, including the Newton physics engine and the GR00T reasoning model, aimed at addressing challenges in robot development [13][15] - These technologies are expected to significantly shorten the robot development cycle and accelerate the implementation of new technologies [15] Group 5 - The newly released GLM-4.6 model has 355 billion total parameters and a context window expanded to 200,000 tokens, enhancing its performance across various tasks [16] - GLM-4.6 has achieved over 30% improvement in token efficiency and a 27% increase in coding capabilities compared to its predecessor, making it one of the strongest coding models available [16] Group 6 - Anthropic has launched Claude Sonnet 4.5, which excels in programming accuracy and maintains stability during complex tasks, outperforming previous models [20][22] - Claude Sonnet 4.5 achieved an 82.0% accuracy rate on the SWE-bench Verified benchmark, surpassing competitors and emphasizing its alignment and safety features [22] Group 7 - DeepMind's new video model, Veo 3, demonstrates zero-shot learning capabilities, allowing it to perform complex visual tasks without prior training [24][28] - Veo 3's understanding of physical laws and abstract relationships indicates its potential to evolve into a foundational visual model similar to LLMs [28]
一位芯片老兵,再战英伟达
半导体行业观察· 2025-10-16 01:00
Core Insights - The article discusses the journey of Naveen Rao and his team from founding Nervana Systems to their new venture, Unconventional, highlighting the evolution of the AI hardware market and the challenges faced by startups in this space [1][30]. Group 1: Founding of Nervana Systems - In 2014, the founders of Nervana, including Naveen Rao, Amir Khosrowshahi, and Arjun Bansal, recognized the potential of deep learning and aimed to address the hardware limitations in AI processing [2][3]. - The team, all with backgrounds in neuroscience, was motivated by a fascination with intelligent machines and aimed to design specialized chips for machine learning [4][7]. Group 2: Acquisition by Intel - In 2016, Intel acquired Nervana for approximately $350 million to strengthen its position in the deep learning chip market, which was being dominated by NVIDIA [10][11]. - Following the acquisition, Rao led Intel's AI platform division, where they developed the Nervana NNP series of chips aimed at competing with NVIDIA's offerings [13][15]. Group 3: Challenges and Setbacks - Despite initial success, Intel announced in 2020 that it would cease development of the Nervana chips in favor of the technology acquired from Habana Labs, which posed a direct competition to Nervana's products [21][22]. - The performance of Habana's chips significantly outperformed Nervana's, leading to doubts about the future of Nervana within Intel's product lineup [19][21]. Group 4: Launch of Unconventional - After leaving Intel, Rao founded Unconventional, aiming to raise $1 billion with a target valuation of $5 billion, significantly higher than Nervana's previous valuation [26][30]. - Unconventional seeks to rethink the foundations of computing, potentially leveraging neuromorphic computing principles to create more efficient AI hardware [27][28]. Group 5: Market Dynamics - The AI hardware market has dramatically changed since 2014, with NVIDIA's market cap soaring to over $4 trillion and a surge in competition from both established companies and new startups [30][31]. - The current landscape presents both opportunities and challenges for new entrants like Unconventional, including the need to compete against NVIDIA's established ecosystem and address customer inertia [31][32].
Veo 3.1 - Frames to video
Google DeepMind· 2025-10-15 15:56
Control the shot from start to finish. Provide a starting and ending image, and Veo will generate a seamless video that bridges the two, perfect for artful and epic transitions. Try it today in Flow at flow.google. Learn more: https://blog.google/technology/ai/veo-updates-flow ____ Subscribe to our channel / @googledeepmind Find us on X / googledeepmind Follow us on Instagram / googledeepmind Add us on Linkedin / deepmind ...
Veo 3.1 - Ingredients to video
Google DeepMind· 2025-10-15 15:56
With "Ingredients to Video," you can use multiple reference images to control the characters, objects and style. Veo uses your ingredients to create a final scene that looks just as you envisioned. Try it today in Flow at flow.google. Learn more: https://blog.google/technology/ai/veo-updates-flow ____ Subscribe to our channel https://www.youtube.com/@googledeepmind Find us on X https://twitter.com/GoogleDeepMind Follow us on Instagram https://instagram.com/googledeepmind Add us on Linkedin https://www.linke ...