Workflow
AlphaEvolve
icon
Search documents
X @Demis Hassabis
Demis Hassabis· 2025-10-01 15:59
RT Google Research (@GoogleResearch)Today we describe how we leverage AlphaEvolve, a @GoogleDeepMind system for iteratively evolving code, to morph snippets of code towards better proof elements in complexity theory that can be automatically verified by a computer program. Read more at: https://t.co/tZ2KU9znVu https://t.co/ytEGze2AOv ...
Transformer作者初创公司最新成果:开源新框架突破进化计算瓶颈,样本效率暴涨数十倍
量子位· 2025-09-28 11:54
Core Insights - The article discusses the launch of an open-source framework called ShinkaEvolve, developed by Sakana AI, which significantly enhances sample efficiency in various computational tasks, achieving results that previously required thousands of evaluations with only 150 samples [1][3][22]. Group 1: Framework Overview - ShinkaEvolve allows large language models (LLMs) to optimize their own code while maintaining efficiency, likened to equipping evolutionary computation with an "acceleration engine" [3][6]. - The framework demonstrates performance comparable to Google's AlphaEvolve but with higher sample efficiency and open-source accessibility [6][22]. Group 2: Key Innovations - The framework incorporates three major architectural innovations that enhance its performance across tasks such as mathematical optimization, agent design, and competitive programming [5][11]. - The first innovation is a parent sampling technique that balances exploration and exploitation through a layered strategy and multi-method integration [11][13]. - The second innovation involves a novelty rejection sampling method that reduces ineffective computations by filtering out low-novelty variants using a two-tiered mechanism [14][16]. - The third innovation is a multi-armed bandit LLM selection strategy based on the UCB1 algorithm, which dynamically schedules LLMs based on their performance during different task phases [17][18]. Group 3: Performance Validation - In mathematical optimization, ShinkaEvolve achieved a significant breakthrough by requiring only 150 evaluations to optimize the placement of 26 circles within a unit square, compared to thousands needed by AlphaEvolve [20][22]. - For agent design, experiments showed that ShinkaEvolve outperformed baseline models in solving mathematical reasoning problems, achieving maximum performance with just seven LLM queries [23][25]. - In competitive programming benchmarks, ShinkaEvolve improved average scores by 2.3% across ten AtCoder problems, demonstrating its effectiveness without extensive code restructuring [28]. - The framework also excelled in evaluating load balancing loss functions in mixed expert models, showing higher accuracy and lower perplexity across multiple downstream tasks [30][32].
Scaling Law再遭质疑:“退化式AI”竟成终局?
Hu Xiu· 2025-08-04 12:14
Group 1 - The large model industry is experiencing a "scaling law" trend, with tech companies and research institutions investing heavily to achieve better model performance through larger data scales [1][2] - Scholars P.V. Coveney and S. Succi warn that the scaling law has significant flaws in improving the predictive uncertainty of large language models (LLMs), suggesting that blindly expanding data may lead to "Degenerative AI," characterized by catastrophic accumulation of errors and inaccuracies [2][4] - The core mechanism supporting LLM learning, which generates non-Gaussian output from Gaussian input, may be the fundamental cause of error accumulation and information disasters [5] Group 2 - Current LLMs exhibit impressive capabilities in natural language processing, but the research team argues that machine learning fundamentally operates as a "black box" and lacks understanding of underlying physics, which limits its application in scientific and social fields [7][9] - Only a few AI tech companies can train large state-of-the-art LLMs, with their energy demands being extremely high, yet performance improvements appear to be limited [10][11] - The research team identifies a low scaling exponent as a root cause of poor LLM performance, indicating that the ability to improve with larger datasets is extremely limited [14] Group 3 - Despite the hype surrounding large models, even advanced AI chatbots produce significant errors, which do not meet the precision standards required in most scientific applications [15][23] - The research team illustrates that even with increased computational resources, accuracy may not improve and could significantly decline once a certain threshold is crossed, indicating the presence of "barriers" to scalability [16][17] - The accuracy of machine learning applications is highly dependent on the homogeneity of training datasets, and issues with accuracy can arise even in homogeneous training scenarios [18][19] Group 4 - The limitations of LLMs in reliability and energy consumption are evident, yet discussions on their technical details are scarce [24] - The tech industry is exploring the use of large reasoning models (LRMs) and agentic AI to enhance output credibility, although these approaches still rely heavily on empirical foundations [25][26] - The research team suggests that a more constructive direction would be to leverage LLMs for generative tasks, guiding uncertainty into exploratory value [27][28] Group 5 - The concept of "Degenerative AI" poses a significant risk, particularly in LLMs trained on synthetic data, leading to catastrophic error accumulation [29][30] - While the current scaling exponent is low but positive, indicating that the industry has not yet entered a phase where more data leads to less information, it is in a stage of "extreme diminishing returns" [32] - The research team emphasizes that relying solely on brute force and unsustainable computational expansion could lead to the reality of Degenerative AI [33][34]
谷歌诺奖大神哈萨比斯:五年内一半几率实现AGI,游戏、物理和生命的本质都是计算
AI科技大本营· 2025-07-25 06:10
Core Insights - The conversation between Lex Fridman and Demis Hassabis focuses on the future of artificial intelligence (AI), particularly the potential for achieving Artificial General Intelligence (AGI) within the next five years, with a 50% probability of success [3][4] - Hassabis emphasizes the ability of classical machine learning algorithms to model and discover patterns in nature, suggesting that all evolutionary patterns can be effectively modeled [5][10] - The discussion also highlights the transformative impact of AI on video games, envisioning a future where players can co-create personalized, dynamic open worlds [3][28] Group 1: AI and AGI - Demis Hassabis predicts a 50% chance of achieving AGI in the next five years, asserting that all patterns in nature can be modeled by classical learning algorithms [3][4] - The conversation explores the idea that natural systems have structure shaped by evolutionary processes, which can be learned and modeled by AI [9][12] - Hassabis believes that building AGI will help scientists answer fundamental questions about the nature of reality [3][4] Group 2: AI in Gaming - The future of video games is discussed, with Hassabis expressing a desire to create games that allow for dynamic storytelling and player co-creation [28][32] - He envisions AI systems that can generate content in real-time, leading to truly open-world experiences where every player's journey is unique [32][33] - The potential for AI to revolutionize game design is highlighted, with Hassabis reflecting on his early experiences in game development and the advancements in AI technology [38][39] Group 3: Computational Complexity - The conversation touches on the P vs NP problem, with Hassabis suggesting that many complex problems can be modeled efficiently using classical systems [15][17] - He believes that understanding the dynamics of systems can lead to efficient solutions for complex challenges, such as protein folding and game strategies [19][20] - The discussion emphasizes the importance of information as a fundamental unit of the universe, which relates to the P vs NP question [16][17] Group 4: AI and Scientific Discovery - Hassabis discusses the potential of AI systems to assist in scientific discovery by combining evolutionary algorithms with large language models (LLMs) [49][51] - He highlights the importance of creativity in science, suggesting that AI may struggle to propose novel hypotheses, which is a critical aspect of scientific advancement [59][60] - The conversation emphasizes the need for AI to not only solve problems but also to generate new ideas and directions for research [60][62] Group 5: Future Aspirations - Hassabis expresses a long-standing ambition to simulate a biological cell, viewing it as a significant challenge that could lead to breakthroughs in understanding life [64][65] - He reflects on the importance of breaking down grand scientific ambitions into manageable steps to achieve meaningful progress [64][65] - The conversation concludes with a vision for the future of AI, where it can contribute to both gaming and scientific exploration, merging creativity with computational power [39][64]
AlphaEvolve:陶哲轩背书的知识发现 Agent,AI 正进入自我进化范式
海外独角兽· 2025-07-18 11:13
Core Insights - AlphaEvolve represents a significant advancement in AI, enabling continuous exploration and optimization to uncover valuable discoveries in complex problems [4][54] - The key to AlphaEvolve's success lies in the development of an effective evaluator, which is crucial for AI's self-improvement capabilities [4][55] - The collaboration between AI and human intelligence is essential, with humans defining goals and rules while AI autonomously generates and optimizes solutions [62][63] Group 1: What is AlphaEvolve? - AlphaEvolve is an AI system that combines the creative problem-solving capabilities of the Gemini model with an automated evaluator, allowing it to discover and design new algorithms [10][12] - The core mechanism of AlphaEvolve is based on evolutionary algorithms, which iteratively develop better-performing programs to tackle various challenges [13][25] Group 2: Key Component - Evaluator - The evaluator acts as a quality control mechanism, ensuring that the solutions generated by AlphaEvolve are rigorously tested and validated [43][45] - AlphaEvolve's evaluator allows for the generation of diverse solutions, filtering out ineffective ones while retaining innovative ideas for further optimization [45][46] Group 3: AI Entering Self-Improvement Paradigm - AlphaEvolve has demonstrated a 23% improvement in the efficiency of key computational modules within Google's training infrastructure, marking a shift towards recursive self-improvement in AI [54][55] - The current self-improvement capabilities of AI are primarily focused on efficiency rather than fundamental cognitive breakthroughs, indicating areas for future exploration [55][56] Group 4: Redefining Scientific Discovery Boundaries - AlphaEvolve is primarily focused on mathematics and computer science, but its potential applications extend to other fields like biology and chemistry, provided there are effective evaluation mechanisms [58][59] - The integration of AI in scientific research signifies a shift towards more rational and systematic approaches to knowledge discovery, enhancing the efficiency of the research process [60][61]
思维链开创者Jason Wei最新文章:大模型将攻克哪些领域? | Jinqiu Select
锦秋集· 2025-07-16 07:58
Core Viewpoint - The rapid evolution of large models is transforming their capabilities into product functionalities, making it crucial for entrepreneurs to stay informed about advancements in model technology [1][2]. Group 1: Characteristics of Tasks AI Can Solve - Tasks that AI can quickly tackle share five characteristics: objective truth, rapid verification, scalable verification, low noise, and continuous reward [2][10]. - The concept of "verification asymmetry" indicates that some tasks are much easier to verify than to solve, which is becoming a key idea in AI [3][8]. Group 2: Examples of Verification Asymmetry - Examples illustrate that verifying solutions can be significantly easier than solving the tasks themselves, such as in Sudoku or website functionality checks [4][6]. - Some tasks have verification processes that are nearly symmetrical, while others may take longer to verify than to solve, highlighting the complexity of verification [6][7]. Group 3: Importance of Verification - The "verifier's law" states that the ease of training AI to solve a task correlates with the task's verifiability, suggesting that tasks that are both solvable and easily verifiable will be addressed by AI [8][9]. - The learning potential of neural networks is maximized when tasks meet the outlined verification characteristics, leading to faster iterations and advancements in the digital realm [12]. Group 4: Case Study - AlphaEvolve - Google’s AlphaEvolve exemplifies the effective use of verification asymmetry, allowing for ruthless optimization of problems that meet the verifier's law characteristics [13]. - The focus of AlphaEvolve is on solving specific problems rather than generalizing across unseen problems, which is a departure from traditional machine learning approaches [13]. Group 5: Future Implications - Understanding verification asymmetry suggests a future where measurable tasks will be solved more efficiently, leading to a jagged edge of intelligence where AI excels in verifiable tasks [14][15].
腾讯研究院AI速递 20250605
腾讯研究院· 2025-06-04 14:24
Group 1 - OpenAI is introducing a lightweight memory feature for free ChatGPT users, allowing personalized responses based on user conversation habits [1] - The lightweight memory feature supports short-term conversation continuity, enabling users to experience basic memory functions [1] - This feature is particularly beneficial in fields such as writing, financial analysis, and medical tracking, with users having the option to enable or disable it at any time [1] Group 2 - ChatGPT's CodeX programming tool is now available to Plus members, featuring internet access, PR updates, and voice input capabilities [2] - The internet access feature for CodeX is turned off by default and must be manually enabled, providing access to approximately 70 safe whitelisted websites [2] - OpenAI has been actively updating CodeX, with three updates in two weeks and more features expected to be released soon [2] Group 3 - AI programming platform Windsurf is set to be acquired by OpenAI for $3 billion, but has faced a near-total cut in access to Claude models from Anthropic [2] - Windsurf is implementing emergency measures, including lowering Gemini model prices and halting free user access to Claude models, citing Anthropic's unwillingness to continue supply [2] - The industry views the supply cut as a result of competitive dynamics following OpenAI's acquisition, with Anthropic shifting focus to IDE and plugins that directly compete with Windsurf [2] Group 4 - Manus has launched a video generation feature that allows for the combination of multiple 5-second clips into a complete story, overcoming video length limitations [3] - The video generation process involves three steps: task planning, phased reference image searching, and segment stitching to complete the editing [3] - Currently, this feature is only available to members, with mixed feedback on its effectiveness, costing approximately 166 points for a 5-second video [4] Group 5 - MoonCast is an open-source conversational voice synthesis model that generates natural bilingual AI podcasts in Chinese and English from a few seconds of voice samples [5] - The model utilizes LLM to extract information and create engaging podcast scripts, incorporating natural speech elements [5] - It employs a 2.5 billion parameter model and extensive training data to achieve over 10 minutes of audio generation through a three-stage training process [5] Group 6 - Turing Award winner Yoshua Bengio has announced the establishment of a non-profit organization, LawZero, which has raised $30 million to develop "design for safety" AI systems [6] - LawZero is working on "Scientist AI," a non-autonomous system aimed at understanding the world rather than taking actions, to counteract current AI risks [6] - This initiative marks the involvement of all three deep learning pioneers in addressing AI risks, with Bengio founding LawZero, Hinton resigning from Google, and LeCun criticizing mainstream AI approaches [6] Group 7 - AlphaEvolve has made significant breakthroughs in combinatorial mathematics, solving a long-standing problem in additive combinatorics, raising the sum-difference set index from 1.14465 to 1.173077 [7] - These breakthroughs highlight the power of AI-human collaboration, with AlphaEvolve discovering initial constructs and mathematicians refining them [7] - This development is seen as a new paradigm in scientific discovery, showcasing the complementary nature of different research methods [7] Group 8 - Jun Chen, a Chinese scientist, has developed an AI diagnostic pen that analyzes handwriting features to assist in the early detection of Parkinson's disease, achieving over 95% accuracy [9] - The pen consists of a magnetoelastic tip and ferromagnetic fluid ink, capable of sensing writing pressure changes and generating recordable voltage signals [9] - This technology offers a lower-cost, portable, and user-friendly alternative to traditional diagnostic methods, particularly beneficial in resource-limited settings [9] Group 9 - Sam Altman predicts that the era of AI executors will emerge within 18 months, with AI evolving from a tool to a problem-solving executor by 2026 [10] - OpenAI's internal use of Codex illustrates the current state of AI agents, which can autonomously receive tasks, query information, and execute multi-step processes [10] - Companies that invest early in AI will gain a competitive advantage through data loops and practical experience, mastering the art of inquiry and problem-solving [10]
陶哲轩转发!华人数学博士后反超DeepMind AI,停滞18年数学问题1个月内3次突破
量子位· 2025-06-04 09:14
Core Viewpoint - The article discusses the collaborative breakthroughs in solving the "Sums and differences of sets problem" achieved by AI and human mathematicians, highlighting the advancements made by DeepMind's AlphaEvolve and subsequent improvements by mathematicians like Robert Gerbicz and Fan Zheng [2][4][30]. Group 1: AlphaEvolve's Contributions - DeepMind's AlphaEvolve improved the matrix multiplication algorithm and broke the record for the "Sums and differences of sets problem," which had been stagnant for 18 years [2][4]. - AlphaEvolve utilized a semi-automated search process, generating numerous candidate solutions through the Gemini model and refining them via an automated evaluation system [14][16]. - The best-performing algorithm constructed a set of 54,265 integers, raising the lower bound of θ to 1.1584, surpassing the previous record of 1.14465 set 18 years ago [18]. Group 2: Human Mathematicians' Improvements - Hungarian mathematician Robert Gerbicz developed a new method that constructs a large set with specific constraints, achieving θ=1.173050, which surpassed AlphaEvolve's result [20][25]. - Gerbicz's approach involved using combinatorial principles to avoid redundant calculations, leading to a set with over 10^43546 elements [24]. - Fan Zheng further improved the result to θ=1.173077 by introducing a theoretical analysis framework, demonstrating that asymptotic analysis can provide systematic methods for further improvements [27][29]. Group 3: Collaborative Dynamics - The results from AlphaEvolve and subsequent human contributions illustrate a complementary relationship between AI and human mathematicians, rather than a competitive one [30][31]. - AlphaEvolve's strength lies in its ability to explore a wide range of problems, allowing human experts to focus on specific areas for deeper investigation and progress [31][32].
陶哲轩转发!DeepMind开源「AI数学证明标准习题集」
量子位· 2025-05-31 03:34
Core Viewpoint - DeepMind has launched an open-source formal mathematical conjecture library, which includes a collection of formally stated mathematical conjectures, addressing the scarcity of resources for open conjectures and aiding AI models in enhancing mathematical reasoning and proof capabilities [1][6][8]. Group 1 - The conjecture library contains a diverse set of mathematical conjectures formalized using Lean, sourced from various avenues [9]. - The library serves as a formal "exercise set" for computers, allowing traditional automated theorem proving (ATP) systems to conduct proof searches based on the conjectures within [11][12]. - Users can contribute by formalizing new conjectures, suggesting desired formal problems, improving citations, and correcting inaccuracies in existing formalizations [16][17][18]. Group 2 - The library is expected to become a benchmark for testing automated theorem proving or formal tools, thereby assisting AI models in improving their mathematical reasoning and proof capabilities [7][8]. - The collaboration between DeepMind and mathematician Terence Tao has been significant, with Tao endorsing the potential of AI in mathematical discovery [28][29]. - The AlphaEvolve project, developed by DeepMind, has made strides in solving long-standing geometric challenges, demonstrating the potential of AI in mathematics [35][41].
形式化证明与大模型:共创可验证的AI数学未来|量子位直播
量子位· 2025-05-27 03:53
Core Viewpoint - The article discusses the advancements in AI's ability to solve mathematical problems, highlighting the competitive landscape among various teams and projects in this domain [1][2]. Group 1: AI Developments - Recent releases such as DeepSeek Prover V2, Terence Tao's AI math livestream, and Google's AlphaEvolve indicate significant progress in AI's mathematical capabilities [1]. - The FormalMATH benchmark test has gained attention for evaluating AI's performance in automated theorem proving [2]. Group 2: Upcoming Events - A livestream event is scheduled for May 29 at 20:00, featuring discussions on the frontier exploration of formal proofs by large language models, with participation from various project teams [2][4]. - Notable speakers include researchers and experts from institutions like the University of Edinburgh and Hong Kong Chinese University, as well as contributors from the 2077AI initiative [3][4]. Group 3: Community Engagement - The article encourages community interaction through comments and participation in AI discussions, promoting a collaborative environment for sharing insights and developments in AI [4][5].