Workflow
机器之心
icon
Search documents
马斯克:研究者不存在了,只有工程师;LeCun:大错特错
机器之心· 2025-08-04 01:36
机器之心报道 编辑:Panda 长期以来,科学家(研究者)和工程师的角色定位泾渭分明。 这种分野不仅存在于学术界,也深植于大众文化之中。比如在美剧《生活大爆炸》中,物理学家谢尔顿・库珀就时常以「真正的科学家」自居,对 身为工程师的霍华德・沃洛维兹冷嘲热讽,两者的职业差异甚至成为该喜剧的重要素材。 他宣布 xAI 从今天起不再区分它们了:「这里只有工程师。」 他还说:「Researcher 这个词是学术界的古董术语了。」 如果用一句不严谨但易于理解的话来概括,科学家致力于发现自然规律,理解世界「为什么如此」;而工程师则更关注「我们能用这些知识做什 么」,他们希望将已有的科学原理转化为现实中的技术、工具与系统。一个追问真理,另一个追求可行性。 但是,世界首富伊隆・马斯克最近对这一深入人心的观念发起了冲锋。 在转发 xAI 自家员工的招聘推文的评论中,马斯克宣称这种区分「研究员」和「工程师」的错误命名,实际上是对双层工程体系的一种隐晦描述。 这条推文下方,马斯克继续略带讥讽地说道:「SpaceX 在火箭和卫星发展方面所做的有意义、尖端的『研究』比地球上所有大学学术实验室的总和 还要多。但我们不使用『研究员』这个自命 ...
全网苦等GPT-5,超级对齐团队遗作成重要线索,奥特曼发话「惊喜很多」
机器之心· 2025-08-03 04:21
| 机器之心报道 | | --- | | 编辑:+0、张倩 | | 最近整个 AI 圈的目光似乎都集中在 GPT-5 上,相关爆料满天飞,但模型迟迟不见踪影。 | | 昨天我们报道了 The Information 扒出的 GPT-5 长文内幕 ,今天奥特曼似乎也坐不住,发了推文表示「惊喜很多,值得等待」。 | | 那么,在等待的过程中,我们来看看这次 GPT-5 的「疑似王牌」之一: 通用验证器 (universal verifier)。 | | 据知情人士透露,OpenAI 一直在开发一种研究人员称之为「通用验证器」的东西,这个东西可能是 GPT-5 中用到的重要技术。 | | 这个概念源于 OpenAI 去年发表的一篇论文。它解决的问题是:当 LLM 仅优化答案正确性时,其推理过程(如 Chain-of-Thought)变得难以被人类或小型模型理解 | 论文标题:Prover-Verifier Games improve legibility of LLM outputs 论文地址:https://arxiv.org/pdf/2407.13692 和验证,导致「可解释性」下降。但在高风险应用中,用户需要 ...
OpenAI IMO金牌团队爆料:AI拒绝作答第六题
机器之心· 2025-08-03 04:21
机器之心报道 编辑:张倩 让 OpenAI 拿到 IMO 金牌的模型,背后居然只有三个核心开发者?这是 OpenAI IMO 团队最近接受媒体采访披露的信息。 这三个人分别是:项目负责人 Alexander Wei、研究工程师 Sheryl Hsu 和高级研究科学家 Noam Brown。其中,Sheryl Hsu 直到今年 3 月才入职。 他们还透露,这个项目是用两三个月的时间突击赶出来的,结果令所有人都很意外。 1、项目是什么时候启动的? 赢得 IMO 金牌一直是 AI 领域,尤其是 OpenAI 内部,一个长期追求的目标,相关的讨论最早可以追溯到 2021 年。 尽管相关的强化学习算法和底层思路已经酝酿了大约六个月,但真正为了这次突破而进行的集中攻关,实际上只在 IMO 竞赛前的两三个月才开始。 2、项目团队有多大? 核心团队仅由 Alex、Cheryl 和 Noam 三人组成, 其中 Alex 负责主要的技术开发。Alex 最初提出这项新技术时也曾面临质疑,但随着他展示出强有力的证据,尤 其是在处理那些「难以验证的任务」上取得了显著的进步后,他的方案逐渐赢得了团队和公司的支持。 3、模型的证明风格是怎 ...
扩散架构 or「NoThinking」,AI 对话的「1Hz 壁垒」如何突破?
机器之心· 2025-08-03 01:30
1. 扩散架构 or「NoThinking」,AI 对话的「1Hz 壁垒」如何突破? Eric Jang 的「智能频谱」如何解释 AI 能力?什么是 AI 的「1Hz 壁垒」? 不同类型的 AI 应用分别需要多快的反映速度?扩散 架构和「NoThinking」路线能解锁怎样的速度层级?具备「Ultra Instinct」的智能体需要哪些先决条件?通用智能体为何要具 备跨越 0.1Hz - 50Hz 的能力?... 机器之心PRO · 会员通讯 Week 31 --- 本周为您解读 ② 个值得细品的 AI & Robotics 业内要事 --- 2. Demis Hassabis 深度对话:AI 的瓶颈在于「品味」的缺失? 什么是「可学习自然系统」?物理规律不是非得靠「交互」才能学习?AI 的「品味」缺失如何体现?下一代 AI 最大机会在于构 建真正的开放式世界?... 本期完整版通讯含 2 项专题解读 + 30 项 AI & Robotics 赛道要事速递,其中技术方面 8 项,国内方面 14 项,国外方面 8 项。 ② 在频谱另一端「极快的智能」对应频率极高的决策行为,如人类在翻书时,手指施加的力量和摩 擦 ...
GPT-5难产,外媒爆料:性能提升不大,OpenAI高管Slack上当众破防
机器之心· 2025-08-02 04:43
Core Viewpoint - The article discusses the anticipated release of GPT-5, highlighting its expected improvements over previous models, while also noting the challenges and limitations faced by OpenAI in achieving significant performance leaps compared to earlier versions [10][12][15]. Group 1: Developments and Features of GPT-5 - GPT-5 is expected to show real improvements in areas such as programming and reasoning, but these enhancements may not match the performance leaps seen between earlier models like GPT-3 and GPT-4 [15][20]. - OpenAI has reportedly found ways to enhance the model's capabilities in coding and complex task handling, allowing it to follow intricate instructions more effectively [15][21]. - Despite these advancements, the performance improvements are described as gradual rather than revolutionary, indicating a slowdown in the pace of AI development at OpenAI [14][16]. Group 2: Challenges and Internal Dynamics - OpenAI is facing various technical challenges that are hindering the progress of its models, including the transition of the o3 model to a chat-based version, which resulted in diminished performance [14][32]. - The company is also experiencing internal pressures due to talent loss to competitors like Meta, which has raised concerns about maintaining its competitive edge [25][26]. - There are ongoing tensions in the relationship between OpenAI and Microsoft, particularly regarding the terms of their collaboration and the future direction of OpenAI's business model [24][27]. Group 3: Financial Aspects and Market Position - OpenAI has successfully raised $8.3 billion in funding, bringing its valuation to $300 billion, as part of a broader strategy to secure $40 billion in total funding this year [42][43]. - The company’s revenue is projected to reach $20 billion by the end of the year, driven by a significant user base of over 700 million weekly active users [42][41]. - The strong financial backing and market interest reflect confidence in OpenAI's future prospects, despite the challenges it faces in model development and competition [40][41].
19岁小哥伯克利辍学创业,获2800万美元融资,OpenAI投了
机器之心· 2025-08-02 04:43
机器之心报道 编辑:冷猫 硅谷创业故事大家可再熟悉不过了,莫过于 「天 才,辍 学和车库」 ,这似乎已经成为了硅谷大佬的刻板印象。 这次要讲的这家创业公司的故事,听起来和那些经典创业故事别无二致,就像美剧《硅谷》里拍的那样。 由两位加州大学伯克利分校辍学生创立的营销自动化创业公司 Conversion,在 7 月 30 日宣布获得了 2800 万美元 的 A 轮融资,领投方为 Abstract,参与投资方包 括 True Ventures 和 HOF Capital,以及来自 OpenAI 和其他 AI 和 GTM 领域的顶级天使投资人参与投资。 经典硅谷创业故事 Conversion 创业公司的团队年轻的夸张。联合创始人兼 CEO 尼尔・泰瓦里(Neil Tewari)才 24 岁;联合创始人和兼 CTO 詹姆斯・焦(James Jiao),是乔瓦里在 伯克利大学的室友。 Neil Tewari 与 James Jiao 尽管这些营销团队的工作流程中已经深度嵌入了各种工具,但他们对那些无法实现自动化的部分普遍存在相似的抱怨。 两位创始人由此找到了自己的创业方向。朋友又为他们引荐了更多市场营销高管,这也帮助他 ...
ICCV 2025 | EPD-Solver:西湖大学发布并行加速扩散采样算法
机器之心· 2025-08-02 04:43
Core Viewpoint - The article discusses the advancements in diffusion models, particularly the introduction of the Ensemble Parallel Direction Solver (EPD-Solver), which enhances the efficiency and quality of image generation while addressing the latency issues associated with traditional methods [2][3][27]. Group 1: Diffusion Models Overview - Diffusion models have rapidly become mainstream technologies for generating images, videos, audio, and 3D content due to their high-quality output [2]. - The core mechanism of diffusion models involves a "denoising" process that iteratively refines a random image into a clear one, which, while ensuring quality, leads to significant inference delays [2]. Group 2: Acceleration Strategies - Researchers proposed three main acceleration strategies: using ODE solvers to reduce iteration steps, model distillation to compress multi-step processes, and parallel computing to speed up inference [3]. - Each method has limitations, such as quality loss with fewer iterations, high costs of retraining models, and underutilization of parallelism in low-step scenarios [3]. Group 3: EPD-Solver Innovation - The EPD-Solver combines the advantages of the aforementioned strategies, utilizing a numerical solver framework, lightweight distillation for a small set of learnable parameters, and parallel computation of gradients [3][4]. - This method effectively reduces numerical integration errors without significant modifications to the model or additional latency, achieving high-quality image generation with only 3-5 sampling steps [3][4]. Group 4: Performance and Results - EPD-Solver can be integrated as a "plugin" into existing solvers, significantly enhancing their generation quality and efficiency [4]. - Experimental results show that EPD-Solver outperforms baseline solvers in various benchmarks like CIFAR-10, FFHQ, and ImageNet, demonstrating its potential in low-latency, high-quality generation tasks [21][25]. Group 5: Key Advantages - The method offers parallel efficiency and precision improvements by introducing multiple gradient evaluations, which significantly enhance ODE integration accuracy while maintaining zero additional inference delay [28]. - EPD-Solver is lightweight and can be easily integrated into existing ODE samplers, avoiding the costly retraining of diffusion models [28].
通向L3的正确范式?理想i8全球首发VLA高阶辅助驾驶,我们帮你试了试
机器之心· 2025-08-02 04:43
Core Viewpoint - The article discusses the launch of the Li Auto i8, which features the new VLA driver model that significantly enhances its assisted driving capabilities through advanced technologies such as the Vision-Language-Action model and NVIDIA's Thor-U chip [2][20]. Group 1: VLA Driver Model Development - The VLA driver model represents a paradigm shift in assisted driving, moving from traditional methods to a more integrated approach that combines visual, language, and behavioral understanding [2][6]. - Li Auto's assisted driving technology has evolved significantly, with the MPI (Mile Per Intervention) level improving from a few kilometers to 100 kilometers within a year, indicating a tenfold increase in performance [5][24]. - The company has implemented "super alignment" techniques to enhance model output and has improved data selection standards, resulting in a twofold increase in model performance from March to May [5][24]. Group 2: Technical Enhancements - The VLA model incorporates reasoning capabilities, allowing for better decision-making and understanding of driving scenarios, which was a limitation in previous models [6][11]. - The system can now process environmental data at a speed of 10Hz, translating sensor inputs into actionable driving decisions [11][13]. - The driving style has shifted from imitating "experienced drivers" to a more stable approach akin to "chauffeur drivers," which is expected to be more appealing to users [15][20]. Group 3: User Interaction and Experience - The VLA model allows for natural language interaction, enabling users to give commands directly to the vehicle, enhancing the overall user experience [9][17]. - The system's memory capabilities allow it to remember user preferences for specific routes, improving personalization [17][20]. - The VLA model has learned defensive driving techniques, enabling it to anticipate potential hazards and react accordingly, which enhances safety [20][21]. Group 4: Data and Simulation - Li Auto has accumulated 4.3 billion kilometers of user driving data, with 1.2 billion kilometers of effective data collected by July this year, which is crucial for training the VLA model [24][25]. - The company employs data synthesis techniques to create balanced datasets for rare driving scenarios, improving the model's performance in complex situations [25][26]. - The use of simulation environments has drastically reduced testing costs and time, allowing for rapid iteration and improvement of the assisted driving system [28][29]. Group 5: Future Prospects - Li Auto aims to provide a "personal driver" experience to a broader user base, with expectations of achieving a 1000 km MPI in the near future [20][32]. - The company has established a fully simulated environment at its headquarters to enhance training efficiency, indicating a commitment to advancing its technology [32][34].
刚刚,谷歌「IMO金牌」模型上线Gemini,数学家第一时间证明猜想
机器之心· 2025-08-02 00:55
Core Viewpoint - Google has launched the Deep Think feature for Google AI Ultra subscribers, utilizing the Gemini 2.5 Deep Think model, which has shown significant improvements over earlier versions and is designed to assist researchers and mathematicians in solving complex problems [1][3][4]. Summary by Sections Model Improvements - The Gemini 2.5 Deep Think model has been enhanced based on feedback from early testers and research breakthroughs, showing notable improvements since its initial release at the I/O conference [3]. - This model variant is derived from the one that won a gold medal at the International Mathematical Olympiad (IMO), and it has been optimized for faster reasoning and better user experience [4]. User Experience - Google AI Ultra subscribers can access Deep Think through the Gemini app by selecting the 2.5 Pro model and switching to "Deep Think" in the prompt bar [6]. - The model integrates with tools like code execution and Google Search, allowing for longer and more detailed responses [6]. Performance Metrics - Deep Think has achieved impressive results in various benchmarks: 34.8% in HLE (without external tools), 87.6% in Live Code Bench V6, 60.7% in IMO 2025, and 99.2% in AIME 2025, showcasing its strong reasoning capabilities in complex problem-solving and programming [18][20]. Problem-Solving Capabilities - The model employs parallel thinking techniques to generate multiple ideas simultaneously, allowing it to explore different hypotheses and arrive at creative solutions over extended reasoning periods [12]. - Deep Think excels in tasks requiring creativity and strategic planning, such as iterative development and design, where it can enhance both aesthetics and functionality with a single prompt [14]. Future Developments - Google plans to release Deep Think with and without tools via the Gemini API to trusted testers in the coming weeks, aiming to better understand its usability in developer and enterprise contexts [11]. - The company is also focused on enhancing the safety and security of the Gemini model during its training and deployment phases, with improvements in content safety and objectivity compared to previous versions [20].
一个模型超了DeepSeek R1、V3,参数671B,成本不到350万美元
机器之心· 2025-08-02 00:55
机器之心报道 机器之心编辑部 Deep Cogito,一家鲜为人知的 AI 初创公司,总部位于旧金山,由前谷歌员工创立,如今开源的四款混合推理模型,受到大家广泛关注。 每个模型都可以直接作答(标准 LLM 模式),也可以在作答前进行自我反思(类似推理模型)。 其中,最大规模的 671B MoE 模型是目前全球最强大的开源模型之一,其性能与最新的 DeepSeek v3 和 DeepSeek R1 模型相当甚至超越,且接近 o3 和 Claude 4 Opus 等闭源前沿模型。 Deep Cogito 的核心方法是迭代蒸馏与增强(Iterated Distillation and Amplification,简称 IDA),它不依赖手工设计的提示词或静态教师模型,而是利用模型自身不 断演化的洞察力来引导训练。 这一过程不是通过延长推理时间来提升性能,而是让模型通过迭代式策略改进内化推理过程。 这是一个全新的扩展范式,使模型逐渐形成更强的直觉,并成为 AI 自我提升(AI 系统自我改进)概念的有力验证。 由于 Cogito 模型在搜索过程中对推理路径有更好的直觉,其推理链比 DeepSeek R1 缩短了 60% ...