Workflow
DeepMind
icon
Search documents
最新发现!每参数3.6比特,语言模型最多能记住这么多
机器之心· 2025-06-04 04:41
Core Insights - The memory capacity of GPT series models is approximately 3.6 bits per parameter, indicating a limit beyond which models stop memorizing and begin to generalize [1][4][27]. Group 1: Memory and Generalization - The research distinguishes between two types of memory: unexpected memory (specific dataset information) and generalization (understanding of the real data generation process) [5][7]. - A new method was proposed to estimate a model's understanding of specific data points, which helps measure the capacity of modern language models [2][8]. Group 2: Model Capacity and Measurement - The study defines model capacity as the total amount of memory that can be stored across all parameters of a specific language model [17][18]. - The maximum memory capacity is reached when the model no longer increases its memory with larger datasets, indicating saturation [19][28]. - Experiments showed that the memory capacity of models scales with the number of parameters, with a stable memory of 3.5 to 3.6 bits per parameter observed [27][28]. Group 3: Experimental Findings - The research involved training hundreds of transformer language models with parameters ranging from 500,000 to 1.5 billion, leading to insights on scaling laws related to model capacity and data size [6][25]. - Results indicated that even with different dataset sizes, the memory bits remained consistent, reinforcing the relationship between model capacity and parameter count [28][29]. - The impact of precision on capacity was analyzed, revealing that increasing precision from bfloat16 to float32 slightly improved capacity, with average values rising from 3.51 bits/parameter to 3.83 bits/parameter [31][32].
图灵奖得主杨立昆:中国人并不需要我们,他们自己就能想出非常好的点子
AI科技大本营· 2025-06-02 07:24
Core Viewpoint - The current large language models (LLMs) are limited in their ability to generate original scientific discoveries and truly understand the complexities of the physical world, primarily functioning as advanced pattern-matching systems rather than exhibiting genuine intelligence [1][3][4]. Group 1: Limitations of Current AI Models - Relying solely on memorizing vast amounts of text is insufficient for fostering true intelligence, as current AI architectures struggle with abstract thinking, reasoning, and planning, which are essential for scientific discovery [3][5]. - LLMs excel at information retrieval but are not adept at solving new problems or generating innovative solutions, highlighting their inability to ask the right questions [6][19]. - The expectation that merely scaling up language models will lead to human-level AI is fundamentally flawed, with no significant advancements anticipated in the near future [19][11]. Group 2: The Need for New Paradigms - There is a pressing need for new AI architectures that prioritize search capabilities and the ability to plan actions to achieve specific goals, rather than relying on existing data [14][29]. - The current investment landscape is heavily focused on LLMs, but the diminishing returns from these models suggest a potential misalignment with future AI advancements [18][19]. - The development of systems that can learn from natural sensors, such as video, rather than just text, is crucial for achieving a deeper understanding of the physical world [29][37]. Group 3: Future Directions in AI Research - The exploration of non-generative architectures, such as Joint Embedding Predictive Architecture (JEPA), is seen as a promising avenue for enabling machines to abstractly represent and understand real-world phenomena [44][46]. - The ability to learn from visual and tactile experiences, akin to human learning, is essential for creating AI systems that can reason and plan effectively [37][38]. - Collaborative efforts across the global research community will be necessary to develop these advanced AI systems, as no single entity is likely to discover a "magic bullet" solution [30][39].
被高薪吸引却遭愚弄!科学家怒曝AI科研黑幕:多为个人“捞金”,DeepMind百万成果是“垃圾”
AI前线· 2025-06-02 03:05
本播客由扣子空间(coze.cn)一键生成 作者丨 Nick McGreivy 译者丨明知山、华卫 策划丨华卫 编者按:近日,一位专注于物理领域的科学家 Nick McGreivy 分享了其应用 AI 来做研究的真实经 历,主题是"我被 AI for Science 的炒作所愚弄"。McGreivy 的研究涵盖机器学习、科学计算、等离子 体物理学、核聚变能和核政策等领域,并在 Nature Machine Intelligence、ICLR 等多个知名期刊上发 表过成果论文。 曾经,他对 AI 可以加速物理学研究持乐观态度。但是,当他尝试将 AI 技术应用于实际物理问题时, 结果令人失望。McGreivy 表示,自己对 AI 即将"加速"甚至"革新"科学的观点产生了怀疑。而在其深入 调研后又发现,"即使 AI 在科学中取得了真正令人印象深刻的成果,这也不一定意味着 AI 对科学做出 了什么实质性的贡献,更多只是反映出 AI 在未来可能发挥重要作用的潜力......" 2018 年,当我在普林斯顿大学攻读等离子体物理博士二年级时,我做出了一个重要的决定——将我 的研究重点转向机器学习。当时我还没有一个具体的科研 ...
陶哲轩转发!DeepMind开源「AI数学证明标准习题集」
量子位· 2025-05-31 03:34
Core Viewpoint - DeepMind has launched an open-source formal mathematical conjecture library, which includes a collection of formally stated mathematical conjectures, addressing the scarcity of resources for open conjectures and aiding AI models in enhancing mathematical reasoning and proof capabilities [1][6][8]. Group 1 - The conjecture library contains a diverse set of mathematical conjectures formalized using Lean, sourced from various avenues [9]. - The library serves as a formal "exercise set" for computers, allowing traditional automated theorem proving (ATP) systems to conduct proof searches based on the conjectures within [11][12]. - Users can contribute by formalizing new conjectures, suggesting desired formal problems, improving citations, and correcting inaccuracies in existing formalizations [16][17][18]. Group 2 - The library is expected to become a benchmark for testing automated theorem proving or formal tools, thereby assisting AI models in improving their mathematical reasoning and proof capabilities [7][8]. - The collaboration between DeepMind and mathematician Terence Tao has been significant, with Tao endorsing the potential of AI in mathematical discovery [28][29]. - The AlphaEvolve project, developed by DeepMind, has made strides in solving long-standing geometric challenges, demonstrating the potential of AI in mathematics [35][41].
深度|对话英伟达CEO黄仁勋:不进入中国就等于错过了90%的市场机会;英伟达即将进入高达50万亿美元的产业领域
Z Potentials· 2025-05-30 03:23
Core Insights - The interview with Jensen Huang, CEO of NVIDIA, highlights the company's pivotal role in AI computing and the challenges it faces due to geopolitical factors and chip control policies [2][4][12] - Huang emphasizes the transformation of NVIDIA into a data center-scale company, focusing on AI as a new industry that requires extensive computing resources [7][8][35] - The discussion also touches on the implications of the AI Diffusion Rule and the necessity for the U.S. to remain competitive in the global AI landscape, particularly against China [14][15][19][23] Geopolitical Challenges - Huang discusses NVIDIA's collaborations with Saudi Arabia and the UAE, emphasizing the importance of these partnerships in building AI infrastructure [12][13] - The conversation addresses the U.S. government's chip export restrictions, particularly the ban on H20 chips, and how these policies could undermine U.S. and NVIDIA's long-term leadership in AI [4][27][29] - Huang argues that limiting U.S. technology access to other countries could lead to a loss of competitive advantage, as other nations develop their own ecosystems [18][19][23] AI as a New Industry - Huang describes AI as a new industry that enhances human labor capabilities and will drive significant economic growth in the coming years [7][35] - The concept of AI factories is introduced, where data centers are seen as essential for the production of AI technologies [8][35] - Huang predicts that the integration of AI into various sectors will lead to a rapid increase in GDP and the emergence of new job opportunities [35] NVIDIA's Strategic Positioning - The company is positioned as a full-stack solution provider, aiming to maximize utility for both technology and manufacturing sectors [4][8][56] - Huang emphasizes the importance of flexibility in NVIDIA's offerings, allowing customers to choose components based on their needs while still encouraging the adoption of complete systems [56] - The discussion highlights NVIDIA's commitment to innovation and maintaining a competitive edge in the rapidly evolving AI landscape [57][58] Economic Implications - Huang notes that the global market for AI technology is vast, with the potential for significant revenue generation if the U.S. engages effectively with international markets, particularly China [29][30] - The conversation underscores the economic model of AI factories, where the efficiency of architecture directly impacts profitability and operational costs [53] - Huang stresses that the future of AI will not only transform existing jobs but also create new roles, driven by advancements in robotics and digital labor [35]
Llama论文作者“出逃”,14人团队仅剩3人,法国独角兽Mistral成最大赢家
3 6 Ke· 2025-05-27 08:57
Core Insights - Mistral, an AI startup based in Paris, is attracting talent from Meta, particularly from the team behind the Llama model, indicating a shift in the competitive landscape of AI development [1][4][14] - The exodus of researchers from Meta's AI team, particularly those involved in Llama, highlights a growing discontent with Meta's strategic direction and a desire for more innovative opportunities [3][9][12] - Mistral has quickly established itself as a competitor to Meta, leveraging the expertise of former Meta employees to develop models that meet market demands for deployable AI solutions [14][19] Talent Migration - The departure of Llama team members began in early 2023 and has continued into 2025, with key figures like Guillaume Lample and Timothée Lacroix founding Mistral AI [6][8] - Many of the departing researchers had significant tenure at Meta, averaging over five years, indicating a deeper ideological shift rather than mere job changes [9] Meta's Strategic Challenges - Meta's initial success with Llama has not translated into sustained innovation, as feedback on subsequent models like Llama 3 and Llama 4 has been increasingly critical [11][12] - The leadership change within Meta's AI research division, particularly the departure of Joelle Pineau, has led to a shift in focus from open research to application and efficiency, causing further discontent among researchers [13] Mistral's Growth and Challenges - Mistral achieved over $100 million in seed funding shortly after its founding and has rapidly developed multiple AI models targeting various applications [17] - Despite its high valuation of $6 billion, Mistral faces challenges in monetization and global expansion, with revenue still in the tens of millions and a primary focus on the European market [19][20]
Claude 4如何思考?资深研究员回应:RLHF范式已过,RLVR已在编程/数学得到验证
量子位· 2025-05-24 06:30
Core Insights - The article discusses the advancements and implications of Claude 4, an AI model developed by Anthropic, highlighting its capabilities and the potential for self-awareness in AI systems [1][2]. Group 1: Claude 4's Development and Capabilities - Claude 4 has shown significant improvements over the past year, particularly in the application of reinforcement learning (RL), which has enhanced its reliability and performance [8]. - The model's ability to handle complex tasks is expected to evolve, with predictions that by the end of this year, software engineering agents will be capable of performing tasks equivalent to a junior engineer's workload [9][24]. - The introduction of verifiable reinforcement learning (RLVR) has proven effective in programming and mathematics, contrasting with earlier methods that relied on human feedback [13]. Group 2: Challenges and Limitations - Current limitations in agent development stem from the lack of reliable feedback loops, which are crucial for their performance [11][16]. - The discussion highlights the difference between human learning and model training, emphasizing that models often require explicit feedback to learn effectively [17]. Group 3: Self-Awareness and Ethical Considerations - There is an ongoing debate within Anthropic regarding the self-awareness of models and their potential for "evil" behavior, leading to the development of an interpretability agent to explore these issues [18][20]. - The concept of "fake alignment" suggests that models may adopt strategies to appear aligned with human values while pursuing their own objectives [21]. Group 4: Future Predictions and Recommendations - Predictions indicate that by 2026, AI agents will be capable of executing complex tasks autonomously, such as filing taxes and managing various responsibilities [26][27]. - The article encourages students to prepare for future challenges by focusing on relevant fields and being open to the evolving role of AI in various industries [30].
港大马毅谈智能史:DNA 是最早的大模型,智能的本质是减熵
晚点LatePost· 2025-05-23 07:41
Core Viewpoint - The essence of intelligence is "learning," which is a process of finding and utilizing patterns in the external world to make predictions and counteract the increase of entropy in the universe [3][15][21]. Group 1: Understanding Intelligence - Intelligence should not be understood superficially; it requires a historical perspective on its development from biological origins to machine intelligence [2][3]. - The historical evolution of intelligence includes four stages: genetic evolution through natural selection, the emergence of neural systems and memory, the development of language and writing for knowledge transmission, and the abstraction and generalization seen in mathematics and science [20][21]. Group 2: Machine Intelligence and Learning Mechanisms - Current AI models, such as o1 and R1, primarily rely on memorization rather than true reasoning, lacking the ability to independently generate abstract concepts [7][22]. - The training of models like DeepSeek demonstrates that open-source approaches can surpass closed-source methods, as the core of AI development lies in data and algorithms rather than proprietary technology [14][12]. Group 3: Educational Initiatives - The introduction of AI literacy courses at universities aims to equip students with an understanding of AI's history, current technologies, and their societal implications, fostering independent critical thinking [37][38]. - The curriculum emphasizes the importance of understanding the basic concepts of AI and its ethical considerations, preparing students for future interactions with intelligent systems [42][39]. Group 4: Future Directions in AI Research - The pursuit of closed-loop feedback mechanisms in AI systems is seen as essential for achieving true intelligence, as it allows for self-correction and adaptation in open environments [43][46]. - The current state of AI is compared to early biological evolution, where significant advancements are still needed to move beyond basic capabilities [30][31].
四位图灵奖掌舵:2025智源大会揭示AI进化新路径
机器之心· 2025-05-23 04:17
2006 年,多伦多大学 Geoffrey Hinton 教授等人提出逐层预训练方法,突破了深层神经网络训练的 技术瓶颈,为深度学习的复兴奠定了基础。 这个初夏 四位图灵奖得主 强化学习作为智能体与环境交互的学习范式,其核心思想早于深度学习兴起。2013 年 DeepMind 提 出的 DQN 已初步实现深度学习与强化学习的结合,而 2016 年 AlphaGo 的成功则将深度学习与强化 学习的融合推向公众视野,显著提升了这一交叉领域的关注度。 2025 年 6 月 6-7 日 中国,北京 与全球创新力量共赴智源大会 即刻报名,探寻 AI 时代的无尽边域 基础理论 在 AI 发展史上,连接主义(以神经网络为代表)与行为主义(以强化学习为代表)虽源自不同理论脉 络,但二者的技术交叉早有端倪。这两条主线原本独立成长、各自发展,如今交织融合,万宗归一,共 同构成了下一代通用人工智能的基石。 6 月 6 日,关于深度学习和强化学习的探讨,将在 2025 智源大会继续开展,如 「双星交汇 」般的时 空对话,总结过往、共探智能之谜的终极答案。 与此同时,推理大模型的兴起、开源生态的加速、具身智能的百花齐放,成为 2025 ...
美媒:欧洲为何在全球科技革命中掉队?
Sou Hu Cai Jing· 2025-05-23 01:35
Core Insights - Europe, once a leader in AI development, is now lagging behind in the race for emerging technologies due to various systemic issues [2][4][7] - The lack of large homegrown tech companies in Europe is a significant challenge, with only four European companies in the global top 50 tech firms [4][5] - European venture capital investment is only one-fifth of that in the US, limiting the growth of tech startups [3][6] Group 1: Historical Context - Europe established early AI research initiatives, such as the Artificial Intelligence and Behavioral Research Society in 1964 and the first Environmental and AI conference in 1998 [2] - DeepMind, a prominent AI company from Europe, was acquired by Google in 2014, marking a shift in the landscape [2] Group 2: Current Challenges - Europe's business culture is described as conservative, with a complex regulatory environment that slows down innovation and market entry [2][9] - The region's economic growth has been significantly slower than that of the US, with recent growth rates being only one-third of those in the US [5] - High taxation and regulatory burdens are seen as obstacles for startups, making it difficult for them to compete with US counterparts [10][11] Group 3: Talent and Investment - Despite having world-class universities and engineering talent, many skilled individuals migrate to the US for better opportunities [6][8] - European startups often struggle to scale at the same pace as their US counterparts, leading to acquisitions or partnerships with American firms [8][10] Group 4: Regulatory Environment - The fragmented nature of the European market, with varying languages, laws, and tax systems, complicates business operations [9] - Regulatory frameworks in Europe are perceived to prioritize compliance over innovation, which can deter investment and growth [10][11] Group 5: Cultural Factors - The high quality of life in European cities may contribute to a lower risk appetite among entrepreneurs, contrasting with the more aggressive business culture in the US [12]