机器之心
Search documents
北大校友、华人学者金驰新身份——普林斯顿大学终身副教授
机器之心· 2025-10-04 05:30
机器之心报道 机器之心编辑部 今天,华人学者金驰(Chi Jin)宣布他在普林斯顿晋升为终身副教授。 金驰于 2019 年加入普林斯顿大学电气与计算机工程系,担任助理教授。在普林斯顿的 6 年任期内,他在 AI 领域的学术影响力迅速提升。 个人主页: https://sites.google.com/view/cjin/ 他的副教授任命将于 2026 年 1 月 16 日正式生效。这一任命不仅是金驰个人学术生涯的重要里程碑,更是对他在机器学习理论领域所做出的基础性贡献的高度认 可,这些贡献为当前 LLM 的崛起提供了关键的数学基石。 去年 2 月, 金驰与杨笛一、杜少雷等华人学者一起获得了 2024 斯隆奖 。 主要贡献 金驰的职业生涯恰逢深度学习革命的爆发。自 2012 年 AlexNet 引领热潮以来,学界与业界在 2010 年代中期已能训练大规模非凸模型,但一个根本问题依然悬而未 决:为何像随机梯度下降(SGD)这样简单的优化器能如此有效,尤其在鞍点问题上缺乏理论解释。 金驰与导师 Michael I. Jordan 等人的合作正面回应了这一挑战,为深度学习的实践成功提供了坚实的理论基础。 与此同时,随 ...
你敢信?GPT-5的电脑操作水平只比人类低2%了
机器之心· 2025-10-04 03:38
Core Insights - The article discusses the advancements in computer-use agents (CUA), particularly focusing on the performance improvements of Agent S3, which has achieved a success rate of 69.9%, nearing human-level performance of 72% [1][15][16]. Technical Developments - Agent S3 builds on Agent S2, simplifying the framework and introducing a native code agent, which enhances performance from 62.6% to 69.9% [2][12]. - The introduction of the Behavior Best-of-N (bBoN) framework allows for parallel execution of agents, selecting the best outcomes from multiple runs, which significantly improves accuracy [2][8]. Performance Metrics - Agent S3's performance metrics show a 13.8% increase in success rate compared to Agent S2, with a reduction in the number of LLM calls per task by 52.3% and a decrease in average task completion time by 62.4% [15][18]. - The article highlights that when running 10 parallel agents, the performance peaks at 69.9% for GPT-5 and 60.2% for GPT-5 Mini [19]. Comparative Analysis - The bBoN framework demonstrates superior performance compared to traditional methods, achieving a success rate of 66.7% when combining models like GPT-5 and Gemini 2.5 Pro, indicating the importance of model diversity [21][22]. - Behavior narratives, as a representation method, outperform other trajectory representations, achieving a success rate of 60.2% [23][24]. Evaluation Mechanisms - The bBoN Judge shows higher accuracy in task evaluation compared to WebJudge, indicating its effectiveness in selecting the best execution results from multiple attempts [25][27]. - The alignment of the bBoN Judge with human preferences is noted, with a 92.8% accuracy in task selection, suggesting its potential as a reliable evaluation tool for CUA tasks [28][29].
吴恩达执教的深度学习课程CS230秋季上新,新增GPT-5专题
机器之心· 2025-10-04 03:38
Core Viewpoint - The updated CS230 Deep Learning course at Stanford, taught by Andrew Ng, emphasizes the importance of artificial intelligence, likening it to electricity, and introduces new content reflecting the latest advancements in AI, particularly focusing on the GPT-5 model [1][4]. Course Structure and Content - The course adopts a flipped classroom model where students must watch Coursera's deeplearning.ai videos before attending in-person classes [3]. - Since its inception in 2017, the course has maintained a similar core framework but has integrated updates relevant to recent AI developments, including a new chapter on GPT-5 [4]. - The course enhances the discussion on generative models and incorporates popular technologies like RAG and AI Agents, using GPT-5 for case studies [6]. - CS230 aims to provide comprehensive knowledge in deep learning, covering both theoretical foundations and practical skills necessary for building and applying deep learning models [10][12]. Key Topics Covered - The course covers a wide range of topics, including: - Basics of neural networks and deep learning [20]. - Optimization techniques such as regularization, Adam optimizer, hyperparameter tuning, Dropout, and Batch Normalization [20]. - Strategies for constructing machine learning projects from conception to successful deployment [20]. - In-depth understanding of Convolutional Neural Networks (CNN) and their applications in image classification and detection [20]. - Mastery of Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) networks for sequence tasks [20]. - Exploration of advanced topics like Generative Adversarial Networks (GANs) and deep reinforcement learning [20]. - Insights from industry and academia, along with practical career development advice in AI [20]. Course Schedule - The 2025 fall course will run for approximately 10 weeks, starting at the end of September [15]. - Weekly topics include introductions to deep learning, neural network basics, CNNs, RNNs, optimization algorithms, generative models, and advanced topics related to GPT-5 [16].
Insta360最新全景综述:全景视觉的挑战、方法与未来
机器之心· 2025-10-04 03:38
Core Insights - The article discusses the transition from perspective vision to panoramic vision, highlighting the "perspective-panorama gap" as a central theme for understanding the challenges and opportunities in this field [6][19]. - It emphasizes the need for a systematic upgrade across data, models, and applications to enhance the usability of panoramic vision technologies [16][19]. Research Background and Motivation - The paper titled "One Flight Over the Gap: A Survey from Perspective to Panoramic Vision" aims to systematically analyze the differences between perspective and panoramic vision, covering over 300 papers and 20 representative tasks [4][19]. - The article provides a comprehensive overview of the challenges faced in panoramic vision, which are categorized into three main gaps: geometric distortion, non-uniform sampling, and boundary continuity [6][9]. Strategies Overview - Four main strategies are identified for adapting tasks to panoramic vision: 1. **Geometric Distortion**: Issues arise when spherical images are projected onto a plane, leading to shape distortion [7]. 2. **Non-uniform Sampling**: Pixel density varies significantly across different regions, affecting resolution [7]. 3. **Boundary Continuity**: The separation of boundaries in 2D images can lead to learning continuity issues [7]. - The article outlines a cross-method comparison to clarify the applicability of different strategies to various tasks [9][15]. Task Toolbox - The article lists over 20 tasks categorized into four main areas: enhancement and assessment, understanding, multi-modal, and generation, along with representative methods and key papers for each task [12][15]. - It highlights the rapid emergence of new paradigms such as diffusion and generative models, particularly in text-to-image/video and novel view synthesis [15]. Future Directions - To transition from "usable" to "user-friendly," advancements must be made in three main areas: data, model paradigms, and downstream applications [16][21]. - Key challenges include: 1. **Data Bottlenecks**: Lack of large-scale, diverse, and high-quality 360° datasets limits general training and reproducible evaluation [21]. 2. **Model Paradigms**: The need for robust models that can adapt from perspective to panoramic vision while maintaining performance across various tasks [21]. 3. **Downstream Applications**: Applications in spatial intelligence, XR, 3D reconstruction, and various industry sectors require effective deployment and compliance [21][22].
又一推理新范式:将LLM自身视作「改进操作符」,突破长思维链极限
机器之心· 2025-10-03 03:39
机器之心报道 机器之心编辑部 推理训练促使大语言模型(LLM)生成长思维链(long CoT),这在某些方面有助于它们探索解决策略并进行自我检查。虽然这种方式提高了准确性,但也增加了 上下文长度、token / 计算成本和答案延迟。 因此,问题来了:当前的模型能否利用其元认知能力,在这一帕累托前沿上提供其他组合策略,例如在降低上下文长度和 / 或延迟的情况下提高准确性? 带着这一问题,Meta 超级智能实验室、伦敦大学学院、Mila、Anthropic 等机构的研究者进行了探索。从抽象层面来看,他们将 LLM 视为其「思维」的改进操作 符,实现一系列可能的策略。 研究者探究了一种推理方法家族 —— 并行 - 蒸馏 - 精炼(Parallel-Distill-Refine, PDR) ,该方法包含以下步骤:(i) 并行生成多样化草稿;(ii) 将其蒸馏成一个有 限的文本工作区;(iii) 在此工作区的基础上进行精炼,生成的输出将作为下一轮的种子。重要的是,通过调整并行度,PDR 能够控制上下文长度(从而控制计算 成本),并且上下文长度不再与生成 token 的总数混淆。 根据当前模型在 PDR 实例中的应用,它 ...
NIPS 2025 Spotlight | 港大提出TreeSynth方法,一句话生成百万规模数据集
机器之心· 2025-10-03 03:39
Core Insights - TreeSynth is a novel data synthesis method inspired by decision trees, addressing the challenge of generating diverse and high-quality training data from scratch [6][7][25] - The method ensures systematic coverage of the data space, overcoming limitations of traditional data synthesis approaches [4][25] Methodology - TreeSynth employs a two-phase workflow: data space partitioning and subspace data synthesis [8][12] - In the first phase, the data space is divided into mutually exclusive subspaces using pivot samples and core criteria [9][12] - The second phase involves generating samples within each atomic subspace based on the path description from the root to the leaf node [13][14] Performance and Validation - Experimental results show that TreeSynth consistently outperforms baseline methods in various benchmarks, achieving significant performance improvements [19][23] - For instance, accuracy on the GSM8K dataset increased from 45.2% to 55.8% using the LLaMA3.1-8B model [19] - TreeSynth also demonstrated a 45% increase in data diversity compared to baseline methods, with improved distribution in the embedding space [23] Future Directions - TreeSynth opens new avenues for synthesizing diverse and comprehensive training datasets, with potential for scalability in large data scenarios [26][27] - Future exploration may focus on optimizing tree depth and partitioning criteria, as well as adapting to complex real-world scenarios [28]
Meta内部混乱持续:FAIR自由不再,LeCun考虑辞职
机器之心· 2025-10-03 03:39
机器之心报道 编辑:+0 Meta 内部混战又有新剧情了,这次主角是 FAIR 实验室。 据 The Information 报道,两位知情人士透露, Meta 最近对 FAIR 实验室施加了一项新政策:所有研究成果在公开发表前,必须通过额外的内部审查。 这项政策在 FAIR 内部引起了轩然大波。多位员工认为,这一变化严重限制了他们此前享有的学术自由,即在 Meta 之外自由分享研究成果的权利。 长久以来,开放的研究氛围一直是 FAIR 吸引顶尖人才的基石。然而,随着 Meta 全面重塑其 AI 业务,公司开始要求 FAIR 更多地为内部产品服务,同时减少可能 助益竞争对手的外部研究分享。 这些变化让 FAIR 的联合创始人 Yann LeCun 深感困扰。据知情人士称,他甚至在九月份私下向同事透露, 或许应该辞去首席科学家的职位。 LeCun 的不满早有征兆,几个月来,他对公司新成立的、统管所有 AI 业务的Meta 超级智能实验室(MSL)的内部状况已日益失望。 今年 7 月,MSL 任命了来自 OpenAI 的研究员赵晟佳担任首席科学家。一位知情人士称,LeCun 对于外界「他已被降职」的看法感到十分恼 ...
刚刚,Anthropic新CTO上任,与Meta、OpenAI的AI基础设施之争一触即发
机器之心· 2025-10-03 00:24
机器之心报道 机器之心编辑部 就在刚刚,Anthropic 迎来了新的首席技术官(CTO)—— 前 Stripe 首席技术官 Rahul Patil。 据报道,Rahul Patil 于本周早些时候加入公司,接替了联合创始人 Sam McCandlish,后者将转任首席架构师一职。 Rahul Patil 在社媒上表达了自己加入 Anthropic 的激动之情与未来期许。他表示,自己很高兴加入一个新的使命和召唤。AI 的可能性是无穷无尽的,这将是一次非 凡的发现之旅,需要付出努力,将这些可能性变为现实。 更重要的是,这将要求我们每天做出深思熟虑的决策,以安全地驾驭这一巨大变革,确保负责任的 AI 最终获胜。 他很感激能够加入 Anthropic 这个谦逊、聪明、勤奋且有责任感的团队,他们激发了全球无数的想象力!他还感谢每一位与自己建设 Stripe 的团队成员,感谢他们 过去五年多的深刻变革! 作为 CTO,Rahul Patil 将负责计算、基础设施、推理以及其他各类工程任务。Sam McCandlish 在担任首席架构师期间,将继续从事预训练和大规模模型训练的工 作,扩展之前的工作。他们二人都将向 Ant ...
全球价值最高创企诞生,OpenAI估值创纪录来到5000亿美元
机器之心· 2025-10-03 00:24
机器之心报道 机器之心编辑部 几天前,OpenAI 重磅发布了全新一代的视频大模型 Sora 2,不仅在物理准确性、真实感和可控性方面都优于以往的系统,还具备同步的对话和音效能力。 Altman 称之为「ChatGPT for creativity」时刻。 | Company | Valuation | | | | | | Country | | --- | --- | --- | --- | --- | --- | --- | --- | | OpenAl | | | | | | $500B | us | | SpaceX | | | | | 400 | | വട | | ByteDance | | | | 220 | | | China | | Anthropic | | | | 183 | | | വട | | Ant Group | | | 150 | | | | China | | Reliance Retail | | 100 | | | | | India | | Databricks | | 100 | | | | | വട | | Shein | | ୧୧ | | | | | China | ...
Sora 2数手指翻车,奥特曼成第一批「受害者」,被AI玩成最惨打工人
机器之心· 2025-10-02 06:19
机器之心报道 Sora 2 生成的视频中,男人能够正确数数,但手指的展示与数字并不完全对应。 这已经不是该博主第一次拿这种提示词测试视频生成模型。早在今年 5 月份,他就用这个提示词测试过 Veo3,Veo3 不仅手指没比划对,数字还只数到 3。 后来博主又润色了提示词:a man counts out loud from 1 to 10, "1, 2, 3, 4, 6, 7, 8, 9, 10", he counts using his fingers and holds them up as he goes.(一名男子大声从 1 数到 10,「1、2、3、4、6、7、8、9、10」,他一边数,一边举起手指),仍以失败告终: 编辑:杨文 奥特曼大型社死现场。 Sora 2,强大如斯,却也数不明白手指。 X 网友 @fofrAI 整了个提示词测试 Sora 2:a man counts out loud from 1 to 10, using his fingers and holding them up as he goes.(一名男子一边举起手指,一边大声数 着从 1 到 10。) 视频一开始,男人的表现 ...