Muon

Search documents
腾讯研究院AI速递 20250617
腾讯研究院· 2025-06-16 14:55
Group 1 - Keller Jordan successfully joined OpenAI based on a blog about the Muon optimizer, which may be used for GPT-5 training [1] - Muon is an optimizer for neural network hidden layers that uses Newton-Schulz iteration to achieve orthogonalization of update matrices, training faster than AdamW [1] - Keller criticizes the literature on optimizers for lacking practical applications and advocates for validating new methods in competitive training tasks [1] Group 2 - Google's AI roadmap acknowledges that the current Transformer attention mechanism cannot achieve infinite context, necessitating fundamental innovations at the core architecture level [2] - Gemini is set to become Google's "unified thread," connecting all services and transitioning towards "proactive AI," supporting multimodal capabilities and agent functions [2] - Google is restructuring its AI team by integrating research and product teams into DeepMind to accelerate innovation, with Gemini 2.5 Pro marking a significant turning point [2] Group 3 - Microsoft showcased 700 real AI agents and Copilot application cases across various industries, including finance, healthcare, education, and retail [3] - Companies using AI agents have significantly improved efficiency, such as Wells Fargo reducing response time from 10 minutes to 30 seconds and KPMG cutting compliance workload by 50% [3] - Microsoft Copilot has led to notable productivity gains, with Michelin increasing productivity by 10 times and 84% of BCI users experiencing a 10-20% efficiency boost [3] Group 4 - Midjourney has entered the video generation field, showcasing a video model with detailed and realistic effects, though lacking audio features compared to Veo 3 [4][5] - Midjourney is adopting an open approach by inviting user participation in video rating to improve the model and promises to consider user suggestions in pricing [5] - The Midjourney V7 image model continues to update, supporting voice generation, draft mode, and conversation mode, with rendering speed improved by 40%, reducing fast mode from 36 seconds to 22 seconds [5] Group 5 - GenSpark launched an AI browser that integrates AI capabilities into every webpage, offering features like price comparison, shopping assistance, and video content summarization [6] - The browser supports "autonomous mode," allowing it to automatically browse, organize information, create podcasts, and access paid websites to collect data [6] - It includes an MCP store with over 700 tools for automation workflows and features ad-blocking, currently available only for Mac [6] Group 6 - MIT student Alex Kachkine innovatively used AI algorithms to restore ancient paintings, reducing the traditional 9-month process to just 3.5 hours, with the research published in Nature [7] - The new method employs AI-generated double-layer "mask" films on the original painting surface, repairing 5,612 areas and filling in 57,314 colors, achieving a 66-fold increase in efficiency [7] - This restoration technique can easily remove chemicals without damaging the original artwork, showing greater effectiveness with more missing areas, potentially allowing more damaged artworks to be restored [7] Group 7 - Trump's "whole government AI plan" may have leaked on GitHub, set to launch the ai.gov website on July 4, promoting AI across the federal government [8] - The plan, led by Thomas Shedd, includes chatbots, super APIs, and real-time monitoring tools, utilizing Amazon Bedrock for AI models [8] - Experts and netizens have raised concerns about security risks, code vulnerabilities, and the outdated government systems' adaptability, criticizing the plan for its vague definitions and potential superficiality [8] Group 8 - XPeng Motors shared advancements in autonomous driving base model development at the AI conference CVPR, working on a cloud-based model with 72 billion parameters [10] - XPeng validated the scale law's effectiveness in autonomous driving VLA models, employing a "cloud-based model + reinforcement learning" strategy to handle long-tail scenarios, processing over 20 million video segments [10] - The company has built a "cloud model factory" with a computing power of 10 EFLOPS, processing over 400,000 hours of video data and innovating a token compression method that reduces vehicle-side processing by 70% [10] Group 9 - a16z partners believe AI is reshaping consumer paradigms, with "task completion" replacing "relationship building" as the main product line, and current AI tools showing strong monetization potential with users paying up to $200 monthly [11] - The true "AI + social" product has yet to emerge, as current platforms merely embed AI-generated content into old structures, necessitating a fundamental rethinking of platforms to create new connection methods [11] - In the AI era, speed has become the primary competitive advantage over traditional moats, including distribution and iteration speed, requiring companies to maintain "dynamic leadership" rather than "static barriers" for long-term survival [11] Group 10 - NVIDIA CEO Jensen Huang publicly criticized Anthropic CEO Dario Amodei's prediction that half of entry-level white-collar jobs will be replaced by AI in the next five years [12] - Huang questioned Anthropic's "exclusive mindset," arguing that AI development should be open and transparent rather than closed and controlled, stating "don't lock yourself away to develop AI and then tell us it's safe" [12] - Anthropic responded that Dario never claimed "only Anthropic can build safe AI," reflecting two differing views on AI governance: Amodei emphasizes caution and ethical frameworks, while Huang believes open competition ensures safety [12]
爆肝一篇博客拿下OpenAI Offer,Muon作者怒揭:几乎所有优化器的论文都是“假的”
3 6 Ke· 2025-06-16 12:46
Core Insights - A blog post by researcher Keller Jordan titled "Muon: An optimizer for hidden layers in neural networks" led to his successful offer from OpenAI, and the techniques discussed may have been used in training GPT-5 [1][4][5] - The post emphasizes that publishing in top conferences does not equate to having a significant impact, challenging traditional academic norms [6][11] Group 1: Blog Impact and Reception - Keller Jordan's blog post gained attention for its practical results, outperforming the previously dominant optimizer AdamW [5][14] - Yuchen Jin, a co-author, highlighted the misconception in academia that publishing in top-tier conferences is the ultimate goal, advocating for real-world impact instead [6][11] - The blog's success illustrates a shift in the AI research landscape, where practical performance may outweigh formal academic credentials [22][24] Group 2: Muon Optimizer Performance - Muon optimizer achieved significant performance improvements, such as reducing training time on CIFAR-10 from 3.3 A100 seconds to 2.6 A100 seconds [14] - In the NanoGPT task, Muon improved validation loss speed by 1.35 times, maintaining advantages even with larger parameter scales [14] - When training a 1.5 billion parameter transformer on the HellaSwag task, Muon reached GPT-2 XL performance in just 10 hours, compared to 13.3 hours with AdamW [14][20] Group 3: Design and Methodology - Muon's core principle involves using SGD-momentum for updates, followed by a Newton-Schulz iteration to approximate orthogonalization of the update matrix [20][22] - This approach allows Muon to replace the original update matrix with a "semi-orthogonal matrix," enhancing its effectiveness [22]
Muon作者仅用一篇博客,就被OpenAI看中了
机器之心· 2025-06-16 04:04
Keller Jordan,OpenAI 深度学习团队主要成员之一,用一篇博客就撬开了 OpenAI 的大门。 这篇名为《 Muon: An optimizer for hidden layers in neural networks 》的博客发布于 2024 年 12 月,而 Keller Jordan 入职 OpenAI 的时间恰好也在此时。 机器之心报道 机器之心编辑部 「许多博士(包括过去的我)都陷入了这样一个误区:认为只有在顶级会议上发表论文才是终极目标。」AI 云服务商 Hyperbolic CEO Yuchen Jin 如是说。 但现在,发表论文并不与学术影响力直接画等号了。 在这篇博客中,Keller Jordan 提出并构建了一种用于神经网络隐藏层的优化器 Muon,其能够在保证神经网络(包括 Transformer 和 CNN)的准确度的前提上大幅 提升其训练速度。 为何只发了博客,而不是发表一篇正式的 arXiv 论文,Keller Jordan 这样解释:能否发表一篇关于新优化器的论文,且包含大量看起来不错的结果,和这个优化器 是否真的有效之间没有任何联系。「我只相信速通。」 一直以来 ...