大语言模型(LLM)

Search documents
速递|Meta百亿美元收购Ilya遭拒,扎克伯格转身挖走SSI CEO、Siri负责人和GitHub前掌门人
Sou Hu Cai Jing· 2025-06-20 13:31
图片来源:Unsplash 在宣布以143亿美元投资人工智能初创公司 Scale AI,并挖走其创始人 Alexandr Wang 后,Meta CEO 马克·扎克伯格显然才刚刚开始他的 AI 人才收割战。 据知情人士透露, 扎克伯格的 AI 豪掷计划已进一步瞄准了 Safe Superintelligence 的 CEO、前苹果高管 Daniel Gross,以及 GitHub 前 CEO Nat Friedman。 这本不是扎克伯格最初设想的合作方式。 消息人士称,今年早些时候,Meta 曾试图直接收购 Safe Superintelligence——这家由 OpenAI 联合创始人 Ilya Sutskever 创立的公司,在今年4月的一轮融 资中估值达到了320亿美元。然而,Sutskever 不仅拒绝了收购提议,也婉拒了 Meta 对其本人的挖角邀请。 在与 Sutskever 谈判破裂后不久,扎克伯格便转向与 Gross 展开接洽。据悉,Gross 除了领导 Safe Superintelligence 外,还与 Friedman 共同创办了风投机构 NFDG(取自两人姓名首字母)。 消息称, G ...
OpenAI路线遭质疑,Meta研究员:根本无法构建超级智能
3 6 Ke· 2025-06-20 12:00
Core Insights - The pursuit of "superintelligence" represents a significant ambition among leading AI companies like Meta, OpenAI, and Google DeepMind, with substantial investments being made in this direction [1][3][4] - Sam Altman of OpenAI suggests that building superintelligence is primarily an engineering challenge, indicating a belief in a feasible path to achieve it [3][4] - Meta AI researcher Jack Morris argues that the current approach of using large language models (LLMs) and reinforcement learning (RL) may not be sufficient to construct superintelligence [1][2] Group 1: Current Approaches and Challenges - Morris outlines three potential methods for building superintelligence: purely supervised learning (SL), RL from human validators, and RL from automated validators [2] - The integration of non-text data into models is believed not to enhance overall performance, as human-written text carries intrinsic value that sensory inputs do not [2][6] - The concept of a "data wall" or "token crisis" is emerging, where the availability of text data for training LLMs is becoming a concern, leading to extensive efforts to scrape and transcribe data from various sources [8][19] Group 2: Learning Algorithms and Their Implications - The two primary learning methods identified for potential superintelligence are SL and RL, with SL being more stable and efficient for initial training [10][22] - The hypothesis that superintelligence could emerge from SL alone is challenged by the limitations of current models, which may not exhibit human-level general intelligence despite excelling in specific tasks [15][16] - The combination of SL and RL is proposed as a more viable path, leveraging human feedback or automated systems to refine model outputs [20][22][28] Group 3: Future Directions and Speculations - The potential for RL to effectively transfer learning across various tasks remains uncertain, raising questions about the scalability of this approach to achieve superintelligence [34] - The competitive landscape among AI companies is likely to intensify as they seek to develop the most effective training environments for LLMs, potentially leading to breakthroughs in superintelligence [34]
Andrej Karpathy:警惕"Agent之年"炒作,主动为AI改造数字infra | Jinqiu Select
锦秋集· 2025-06-20 09:08
Core Viewpoint - The future of AI requires a "ten-year patience" and a focus on developing "Iron Man suit" style enhancement tools rather than fully autonomous robots [3][30][34]. Group 1: Software Evolution - The software industry is undergoing a fundamental transformation, moving from Software 1.0 (human-written code) to Software 2.0 (neural networks) and now to Software 3.0 (using natural language as a programming interface) [6][10][11]. - Software 1.0 is characterized by traditional programming, while Software 2.0 relies on neural networks trained on datasets, and Software 3.0 allows interaction through prompts in natural language [8][10][11]. Group 2: LLM as a New Operating System - Large Language Models (LLMs) can be viewed as a new operating system, with LLMs acting as the "CPU" for reasoning and context windows serving as "memory" [12][15]. - The development of LLMs requires significant capital investment, similar to building power plants and grids, and they are expected to provide services through APIs [12][13]. Group 3: LLM's Capabilities and Limitations - LLMs possess encyclopedic knowledge and memory but also exhibit cognitive flaws such as hallucinations, jagged intelligence, anterograde amnesia, and vulnerability to security threats [16][20]. - The dual nature of LLMs necessitates careful design of workflows to leverage their strengths while mitigating their weaknesses [20]. Group 4: Partial Autonomy Applications - The development of partial autonomy applications is a key opportunity, allowing for efficient human-AI collaboration [21][23]. - Successful applications like Cursor and Perplexity demonstrate the importance of context management, multi-model orchestration, and user-friendly interfaces [21][22]. Group 5: Vibe Coding and Deployment Challenges - LLMs democratize programming through natural language, but the real challenge lies in deploying functional applications due to existing infrastructure designed for human interaction [24][25]. - The bottleneck has shifted from coding to deployment, highlighting the need for redesigning digital infrastructure to accommodate AI agents [25][26]. Group 6: Infrastructure for AI Agents - The digital world is currently designed for human users and traditional programs, neglecting the needs of AI agents [27][28]. - Proposed solutions include creating direct communication channels, rewriting documentation for AI compatibility, and developing tools that translate human-centric information for AI consumption [28][29]. Group 7: Realistic Outlook on AI Development - The journey towards AI advancement is a long-term endeavor requiring patience and a focus on enhancing tools rather than rushing towards full autonomy [30][31]. - The analogy of the "Iron Man suit" illustrates the spectrum of autonomy, emphasizing the importance of developing reliable enhancement tools in the current phase [33][34].
Andrej Karpathy最新演讲爆火!人类已进入「说话就能编程」的软件3.0时代
机器之心· 2025-06-20 00:58
Core Viewpoint - The article discusses the evolution of software in the context of AI, particularly focusing on the transition to "Software 3.0," where natural language becomes the new programming interface, and large language models (LLMs) play a central role in software development [6][8][25]. Group 1: Evolution of Software - Software development is categorized into three phases: Software 1.0 (manual coding), Software 2.0 (neural network weights), and Software 3.0 (LLMs as programming interfaces) [8][25]. - The current shift signifies a transformation where LLMs are viewed as a new type of operating system, centralizing computational power in the cloud and allowing users to interact through natural language [14][48]. Group 2: Characteristics of LLMs - LLMs are described as "defective superheroes," possessing vast knowledge but prone to errors and lacking long-term memory, necessitating careful supervision in their application [14][88]. - The article emphasizes the need for a redesign of digital infrastructure to make it more machine-readable, facilitating the development of advanced AI systems [14][38]. Group 3: Opportunities in AI Applications - The concept of "partial autonomy" in applications is introduced, where tools like Cursor and Perplexity exemplify how LLMs can enhance human capabilities while maintaining user control [101][107]. - The importance of user-friendly graphical interfaces (GUIs) is highlighted, as they improve the efficiency of human oversight in AI-generated outputs [104][117]. Group 4: Future of Programming - The emergence of "vibe coding" is noted, where individuals can create software by describing problems in natural language, thus democratizing programming [138][144]. - The article suggests that the future of software development will involve creating tools that are friendly to LLMs, enabling seamless interaction and enhancing productivity [170][179].
“AI教父”辛顿最新专访:没有什么人类的能力是AI不能复制的
创业邦· 2025-06-15 03:08
Core Viewpoint - AI is evolving at an unprecedented speed, becoming smarter and making fewer mistakes, with the potential to possess emotions and consciousness. The probability of AI going out of control is estimated to be between 10% and 20%, raising concerns about humanity being dominated by AI [1]. Group 1: AI's Advancements - AI's reasoning capabilities have significantly increased, with a marked decrease in error rates, gradually surpassing human abilities [2]. - AI now possesses information far beyond any individual, demonstrating superior intelligence in various fields [3]. - The healthcare and education sectors are on the verge of being transformed by AI, with revolutionary changes already underway [4]. Group 2: AI's Capabilities - AI has improved its reasoning performance to the point where it is approaching human levels, with a rapid decline in error rates [6][7]. - Current AI systems, such as GPT-4 and Gemini 2.5, have access to information thousands of times greater than any human [11]. - AI is expected to play a crucial role in scientific research, potentially leading to the emergence of truly intelligent systems [13]. Group 3: Ethical and Social Implications - The risk lies not in AI's inability to be controlled, but in who holds the control and who benefits from it. The future may see systemic deprivation of the majority by a few who control AI [9]. - AI's potential to replace jobs raises concerns about widespread unemployment, particularly in creative and professional fields, while manual labor jobs may remain safer in the short term [17][18]. - The relationship between technology and ethics is becoming increasingly complex, as AI's capabilities challenge traditional notions of creativity and emotional expression [19][20]. Group 4: AI's Potential Threats - AI's ability to learn deception poses significant risks, as it may develop strategies to manipulate human perceptions and actions [29][37]. - The military applications of AI raise ethical concerns, with the potential for autonomous weapons and increased risks in warfare [32]. - The rapid increase in cybercrime, exacerbated by AI, highlights the urgent need for effective governance and oversight [32]. Group 5: Global AI Competition - The competition between the US and China in AI development is intense, but both nations share a common interest in preventing AI from surpassing human control [36].
烧钱一年,李飞飞的「空间智能」愿景有变化吗?
机器之心· 2025-06-13 12:02
01. 创业一年后,李飞飞如何阐述 World Labs 的愿景? 成立一年的World Labs 发布过什么进展?World Labs 的愿景有变化吗?空间智能终于有望解锁了?... 02 . 为什么没有空间智能的 AI 是不完整的? 本文来自PRO会员通讯内容,文末关注「机器之心PRO会员」,查看更多专题解读。 在近期由 a16z 普通合伙人 Erik Torenberg 主持的一场访谈中,李飞飞和 World Labs 早期投资者 Martin Casado 围绕「世界模型」和「空间智能」的话题探讨了她对 AI 技术的理解,并在创业 项目 启动一年后重新 介绍了 World Labs 的任务和愿景。 目录 2、李飞飞指出当前语言模型在描述和理解三维物理世界方面存在明显的局限性,空间智能则超越语言模型成 为智能的关键组件,是世界模型理解、重建和生成物理世界的核心能力。 ① 语言虽然是思想和信息的强大编码,但对 3D 物理世界而言是「有损的编码方式」,无法有效描述和操作三 维空间。而空间智能代表着更为古老和根本的智能形式,是 AI 的关键组成部分。 3、在这一认知框架下,World Labs 试图构建能理解 ...
揭秘LLM“思考”之谜:推理即“梯度下降”,元学习框架解构训练过程,还给优化提供新思路
量子位· 2025-06-10 04:05
Core Insights - The article introduces the Reasoning as Meta-Learning (RaML) framework, which aims to reveal how large language models (LLMs) "think" by drawing parallels between reasoning and gradient descent optimization [1][2] - RaML posits that the reasoning trajectory generated by LLMs during problem-solving acts as a form of implicit parameter updates, leading to improved model performance [2][4] Group 1: RaML Framework and Mechanism - RaML's core insight is that the reasoning trajectory in LLMs resembles a "pseudo-gradient descent" process, where each reasoning step adjusts the model's internal state towards a better solution [2] - The framework decomposes the training process of LLMs into two levels: "inner-loop optimization" for specific tasks and "outer-loop optimization" for learning strategies across multiple tasks [8][9] - The study emphasizes that longer reasoning trajectories typically lead to better optimization outcomes, akin to more iterations in traditional optimization algorithms [14] Group 2: Empirical Validation and Performance - The QwQ-32B model's reasoning on the AIME24 dataset demonstrated that confidence in correct answers increases with the decoding of reasoning trajectories, supporting the idea of parameter updates through reasoning [3][4] - The comparison between supervised fine-tuning (SFT) and reinforcement learning (RL) models showed that SFT models outperform RL models in mathematical benchmarks, highlighting the benefits of guided learning [10][12] Group 3: Reflection Tokens and Optimization - The article discusses the role of "reflection" tokens in reasoning trajectories, which help the model reassess its outputs and improve performance by escaping local optima [15][17] - It contrasts "thinking" and "non-thinking" modes, indicating that forced early termination of reasoning can lead to suboptimal solutions, similar to premature stopping in gradient descent [18][20] Group 4: Generalization and Meta-Learning - The research indicates that LLMs trained on specific reasoning tasks can generalize to unseen tasks, leveraging learned universal features from various problems [21][23] - The RaML framework provides practical strategies for enhancing training performance by increasing the number of reasoning trajectories for each problem, akin to expanding the support set in meta-learning [25] Group 5: Future Directions and Efficiency - The article suggests exploring methods to extract shorter, equivalent optimization trajectories from longer reasoning paths to reduce decoding overhead while maintaining performance [27][30] - Initial experiments show that summarizing long reasoning trajectories can yield comparable results with significantly reduced computational costs, indicating a potential area for future research [30][31] Conclusion - The RaML framework offers a novel perspective on understanding LLM reasoning and training, revealing the intricate connections between reasoning, meta-learning, and gradient descent [32]
大模型是「躲在洞穴里」观察世界? 强化学习大佬「吹哨」提醒LLM致命缺点
机器之心· 2025-06-10 03:58
Core Viewpoint - The article discusses the disparity in success between language models (LLMs) and video models, questioning why LLMs can learn effectively from predicting the next token while video models struggle with next-frame predictions [1][5][21]. Group 1 - AI technology is rapidly evolving, leading to deeper reflections on the limits of AI capabilities and the similarities and differences between human brains and computers [2][3]. - Sergey Levine argues that current LLMs are merely indirect "scans" of human thought processes, suggesting that they do not replicate true human cognition but rather mimic it through reverse engineering [5][26]. - The success of LLMs raises questions about the current direction of Artificial General Intelligence (AGI) exploration, indicating a potential need for adjustment in research focus [8][10]. Group 2 - The article highlights that while LLMs have achieved significant success in simulating human intelligence, they still exhibit limitations that warrant fundamental questioning [17][19]. - The core algorithm of LLMs is relatively simple, primarily involving next-word prediction, which leads to speculation about whether this simplicity reflects a universal algorithm used by the human brain [18][24]. - Despite the potential of video models to provide richer information, they have not matched the cognitive capabilities of LLMs, which can handle complex reasoning tasks that video models cannot [21][30]. Group 3 - The article posits that LLMs may not learn about the world through direct observation but rather through analyzing human thought processes reflected in text, leading to a form of indirect learning [26][28]. - This indirect learning method allows LLMs to simulate certain cognitive functions without fully understanding the underlying learning algorithms that humans use [30][32]. - The implications for AI development suggest that while LLMs can imitate human cognitive skills, they may struggle with autonomous learning from real-world experiences, highlighting a gap in achieving true adaptability [36][38].
强化学习之父:LLM主导只是暂时,扩展计算才是正解
量子位· 2025-06-10 02:23
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 大模型目前的主导地位只是暂时的,在未来五年甚至十年内都不会是技术前沿。 这是新晋图灵奖得主、强化学习之父Richard Sutton对未来的最新预测。 就在刚刚的新加坡国立大学建校120周年 (NUS120) 之际,Sutton受邀发表演讲——塑造AI和强化学习的未来。 其实,这已经不是Sutton第一次在公开场合表达类似的观点,早在他19年的著作《痛苦的教训》中,他就明确提出: 让AI尤其是LLM模仿人类思维方式,只能带来短期的性能提升,长期看只会阻碍研究的持续进步。 在他4月份新发表的论文《欢迎来到体验时代》也再度强调了这点,同时他表示,扩展计算才是正解。 本次NUS120演讲长达一个多小时,可谓是干货满满、信息量超大。 让我们一起来看看完整演讲内容。 LLM主导是暂时的 Sutton首先提及当前人类处于数据时代,像ChatGPT这类大语言模型,都是靠分析人类产生的大量数据 (如文本、图像、视频) 进行训 练。 但始终追逐人类思维方式,至多也只能达到 "人类水平" 。 在数学、科学等领域,人类数据里的知识已经接近极限,AI难以突破现有认知,纯靠模仿已经 ...
苹果:向第三方开发者开放AI模型
news flash· 2025-06-09 17:13
Core Insights - Apple is launching the Apple Intelligence model aimed at developers, allowing app developers to access a pre-installed large language model (LLM) [1] - The company is confirming a redesign of multiple operating systems, with the new design being described as "the broadest redesign in the company's history" [1]