Workflow
机器之心
icon
Search documents
失去三个联创后,Mira公司危机持续:又有两人要出走
机器之心· 2026-01-16 08:13
编辑|张倩 继奥特曼在 OpenAI 的「宫斗」大戏后,他的老搭档 Mira 这周的经历也够拍一部电视剧了。 如果再加上之前离开的 PyTorch 大神 Andrew Tulloch,Thinking Machines Lab 目前已失去了三位联创,可谓创业未半,核心团队就散了。 今天,事件还在继续发酵,该实验室的另外两位技术骨干 —— 基础设施工程师 Ian O'Connell 和研究模型架构的研究员 Lia Guy 也被爆出将要离开,后者明确也要 回 OpenAI。 多家媒体将此事件描述为「OpenAI 对 Thinking Machines Lab 的人才突袭(raid)」。据《连线》报道,这次挖人行动已在 OpenAI 内部筹备数周。 昨天,我们报道了前 OpenAI CTO Mira Murati 创办的 Thinking Machines Lab 出现 重大人事变动 的消息: 联合创始人兼 CTO Barret Zoph 被解雇,另一位联创 Luke Metz 以及创始团队成员 Sam Schoenholz 也一起离开,三人一起回归 OpenAI 。 而且,这里面还有一些难辨真假的纠葛。知情人士透 ...
不止于量化:最新综述用「时-空-构」三维视角解构KV Cache系统级优化
机器之心· 2026-01-16 08:13
随着 LLM 向 1M 上下文演进, KV cache(键值缓存) 已成为制约推理服务效率的核心瓶颈。自回归生成的特性使得模型必须存储历史 token 的 key-value 状态 (即 KV cache)以避免重复计算,但 KV cache 的显存占用随着上下文长度的增长而膨胀,带来显著的内存瓶颈。 过去两年,关于 KV cache 的优化工作爆炸式增长,包括调度、迁移、压缩等策略层出不穷。然而,现有综述主要聚焦于 LLM 推理或服务的整体效率,大多仅将 KV cache 作为其中一个子模块作简要讨论。 近期,来自 墨尔本大学和华中科技大学的研究者们 发布了一篇深度综述,从 MLSys 的思维 出发,用一套新颖的 「时间 - 空间 - 结构」系统行为视角 对 KV cache 优化方法进行了系统性梳理与深入分析,并将相关资源整理成了 持续维护的 Awesome 资源库, 方便研究者与从业人员快速定位与落地。 什么是「 sKis」? 为了提供更聚焦的视角和理解,作者们首先在综述中定义了 sKis 的边界:在推理服务阶段,以 KV cache 为核心优化对象,在不依赖模型重训或结构修改的前提 下,提升吞吐、延迟 ...
美团又上新模型,8个Thinker齐开工,能顶个诸葛亮?
机器之心· 2026-01-16 08:13
Core Insights - The article discusses the latest advancements in AI models, specifically focusing on Meituan's LongCat-Flash-Thinking-2601, which features 560 billion parameters and is built on an innovative MoE architecture [1][41][62] - The model introduces a Heavy Thinking Mode that allows for simultaneous multi-path reasoning, enhancing the reliability and comprehensiveness of conclusions [4][48][62] - LongCat-Flash-Thinking-2601 demonstrates significant improvements in agent capabilities, achieving top performance in various benchmark tests and showing enhanced generalization in out-of-distribution (OOD) scenarios [6][62] Model Features - LongCat-Flash-Thinking-2601 employs a Heavy Thinking Mode that activates eight independent thinkers to explore different reasoning paths, thereby reducing errors and improving answer quality [4][48][50] - The model's architecture supports parallel thinking and iterative summarization, allowing for a broader and deeper exploration of complex problems [41][50] - A new evaluation method for agent model generalization has been introduced, which generates complex tasks based on given keywords, enhancing the model's adaptability to unknown scenarios [8][10][11] Performance Testing - Real-world testing of the model showed its capability in logical reasoning tasks, where it effectively utilized the Heavy Thinking Mode to arrive at reliable answers through collaborative reasoning [12][15][16] - The model's programming abilities were tested by generating games like Flappy Bird and Conway's Game of Life, showcasing its versatility despite the high computational cost of using multiple thinkers [26][32][32] - In a comparative analysis with Claude 4.5 Opus, LongCat-Flash-Thinking-2601 achieved a 100% standard coverage rate, outperforming its competitor in handling complex tool dependencies [38][62] Technological Innovations - The model incorporates advanced techniques such as environment scaling and multi-environment reinforcement learning, which enhance its training and performance in diverse scenarios [41][51][53] - LongCat's training process includes the introduction of noise to improve robustness, allowing the model to perform well in real-world conditions that are often imperfect [60][62] - The upcoming LongCat ZigZag Attention mechanism aims to support a context of up to 1 million tokens, further expanding the model's capabilities [63] Development Timeline - Meituan's AI model development has been rapid, with consistent updates since its initial launch in September 2025, focusing on enhancing response speed, logical reasoning, and multi-modal capabilities [65][67] - The company aims to create a model that can effectively solve real-world problems, aspiring towards a future where "model as a service" becomes a reality [68]
刚刚,Geoffrey Hinton成为第二位引用量破百万的科学家
机器之心· 2026-01-16 01:55
Core Viewpoint - Geoffrey Hinton has officially become the second computer scientist in history to surpass 1 million citations on Google Scholar, marking a significant milestone in his academic career and contributions to artificial intelligence [1][3]. Group 1: Academic Achievements - Hinton's citation count currently stands at 1,000,083, with an h-index of 192, indicating his substantial impact in the field of computer science and artificial intelligence [2]. - He is renowned for his work on backpropagation, which addressed the training challenges of multilayer neural networks, laying the groundwork for the deep learning revolution [10]. - Hinton, along with Yoshua Bengio and Yann LeCun, received the Turing Award in 2018, recognizing their pivotal contributions to the field of deep learning [13]. Group 2: Key Contributions - Hinton's notable innovations include the Boltzmann Machine, Restricted Boltzmann Machine, Deep Belief Network, Dropout technique, t-SNE for data visualization, Capsule Networks, and Knowledge Distillation, among others [14]. - His collaboration on AlexNet, which won the ImageNet competition in 2012, is considered a landmark moment that demonstrated the power of deep learning [16]. - The paper "Deep Learning," co-authored by Hinton, has garnered over 100,000 citations, summarizing the evolution and principles of deep learning [16]. Group 3: Personal Background and Career - Born into an academic family, Hinton's early life was marked by high expectations, which shaped his relentless pursuit of knowledge [5][8]. - He moved to Canada in the 1980s, where he established a long-term academic career at the University of Toronto, contributing significantly to the development of AI in Canada [9]. - Hinton's later years have seen him express concerns about the potential risks of AI, emphasizing the need for caution in its development [20]. Group 4: Legacy and Impact - Hinton's citation milestone reflects not only his individual achievements but also the collaborative efforts of his students, Alex Krizhevsky and Ilya Sutskever, who have also made significant contributions to AI [29]. - The historical context of Hinton's work illustrates the broader narrative of humanity's quest to understand intelligence, highlighting the transformative impact of his research on modern AI [31].
腾讯AngelSlim升级,首个集LLM、VLM及语音多模态为一体的投机采样训练框架,推理速度飙升1.8倍
机器之心· 2026-01-16 01:55
Core Insights - The article discusses the challenges of high inference costs and delays in the large model application landscape, highlighting the need for cost reduction and efficiency improvements in the industry [2] - Speculative sampling is introduced as a novel inference acceleration paradigm that offers near-lossless speedup, gaining popularity in the industry [2] - Tencent's upgraded AngelSlim training framework leverages speculative sampling to enhance performance across various modalities, achieving significant inference speed improvements [2] Group 1: AngelSlim and Speculative Sampling - Speculative sampling utilizes a lightweight draft model to generate multiple candidate tokens, which are then verified by a larger model, effectively parallelizing the decoding process and reducing latency [4] - AngelSlim integrates various compression algorithms, including quantization and speculative sampling, to support multi-modal model training, achieving acceleration rates of 1.4 to 1.9 times [4][6] - The framework emphasizes deployment readiness, allowing models trained with AngelSlim to be seamlessly integrated into existing frameworks like vLLM and Sglang [7] Group 2: Key Features of AngelSlim - AngelSlim supports full-modal speculative sampling training, enabling shared core algorithms and engineering capabilities across different modalities [6] - The data processing module provides a stable and reusable data foundation for training across multiple modalities, including data resampling and preprocessing [12][13] - The model module features a unified TargetModel interface, allowing easy integration of new model architectures without modifying core algorithms [18] Group 3: Training Components and Performance - The training module is designed for both online and offline training modes, catering to different model sizes and memory constraints [20] - The training process includes training-time testing, allowing the model to learn from its own predictions during training [21] - AngelSlim's trained models have demonstrated acceleration performance in various tasks, achieving speedups of 1.4 to 1.9 times under specific conditions [25] Group 4: Future Plans - Future developments will focus on enhancing speculative sampling capabilities through tool and algorithm advancements, including offline hidden states generation and deeper integration of multi-modal features [30]
DeepSeek连发两篇论文背后,原来藏着一场学术接力
机器之心· 2026-01-16 00:42
编辑|张倩、陈陈 2026 年 1 月过半,我们依然没有等来 DeepSeek V4,但它的模样已经愈发清晰。 最近,DeepSeek 连发了两篇论文,一篇解决信息如何稳定流动,另一篇聚焦知识如何高效检索。 第一篇论文( mHC )出来的时候,打开论文的人都表示很懵,直呼看不懂,让 AI 助手用各种方式讲给自己听。我们也翻了翻网友的讨论,发现理解起来比较透 彻的办法其实还是要回到研究脉络,看看这些年研究者们是怎么接力的。要理解第二篇论文( Conditional Memory )也是如此。 于是,我们就去翻各路研究者的分析。这个时候,我们发现了一个有意思的现象:DeepSeek 和字节 Seed 团队的很多工作其实是存在「接力」的 —— mHC 在字节 Seed 团队 HC(Hyper-Connections)的基础上进行了重大改进;Conditional Memory 则引用了字节 Seed 的 OverEncoding、UltraMem 等多项工作。 如果把这些工作之间的关系搞清楚,相信我们不仅可以加深对 DeepSeek 论文的理解,还能看清大模型架构创新正在往哪些方向突破。 在这篇文章中,我们结合自己 ...
仅需一个混频器的无线射频机器学习推理,登上Science Advances!
机器之心· 2026-01-16 00:42
本文作者包括来自杜克大学的高智辉、陈廷钧教授和 MIT 的 Dirk Englund 教授团队。 高智辉,杜克大学电子与计算机工程系博士生。本科毕业于复旦大 学电子工程系。研究兴趣于下一代网络系统,包括信息物理系统、机器学习加速等。 模型-数据的分解式计算 机器学习部署在边端设备的时候,模型总是存储在云端服务器上(5G 基站),而模型输入输出总是在边端设备上(例如用照相机拍摄照片然后识别其中的 目标)。在这种场景下,传统有以下两种方案完成机器学习的推理: 方案一:上传模型输入到云端。 这种方案需要每个用户分别把自己的模型输入上传到云端,然后在云端完成推理,最后把模型输出下载到各个用户。 这种方案需要消耗大量的带宽资源,尤其是在大用户规模的情形下;其次,这种上传用户模型输出的方案会涉及用户隐私泄露的问题。 方案二:广播模型下载到边端。 这种方案要求是云端服务器把模型广播给所有的用户,每个用户各自存储模型,并且在边缘端进行计算。 这种方案极大挑战了边缘用户的算力,并且在模型存储的过程中还有边端存储读写的开销。 在我们的工作里,我们提出了第三种分离式计算(disaggregated computing)的方案: 广 ...
Mira公司内乱?CTO被开除,带团队回OpenAI,翁荔上推发言
机器之心· 2026-01-15 09:17
今天对于 Thinking Machines Lab 和 OpenAI 来说都是不同寻常的一天。 Thinking Machines Lab 创始人兼 CEO Mira Murati 官宣了 与 联合创始人兼 CTO Barret Zoph 的分道扬镳 。 同时,她也宣布了 新任 CTO 的人选 ——Pytorch 之父 Soumith Chintala 。这位在现代 AI 基础设施领域颇具影响力的研究者在去年 11 月初离开了 Meta,并选择 加入 Thinking Machines Lab。 大约 1 个小时后,OpenAI 应用 CEO Fidji Simo 宣布, Barret Zoph 将重返 OpenAI 。 连同他一起回归 OpenAI 的还有 另一位 Thinking Machines Lab 联合创始人 Luke Metz 以及 创始团队成员 Sam Schoenholz 。 机器之心编辑部 两位联合创始人同时从 Thinking Machines Lab「出走」,这一消息在圈内造成了不小的冲击。 根据有人获悉的内部消息, 此次是由于 Barret Zoph 个人的不道德行为,Thinki ...
通用级PixVerse R1的技术突破,揣着进入平行世界的密码
机器之心· 2026-01-15 09:17
Core Viewpoint - The article discusses the launch of PixVerse R1, a groundbreaking model in video generation that enables real-time, high-quality video creation, marking a significant advancement in the industry [1][3][38]. Group 1: Technological Breakthroughs - PixVerse R1 is the first global model to support real-time generation of 1080P resolution videos, transitioning video generation from static output to real-time interaction [6][35]. - The model achieves a significant increase in computational efficiency, allowing for real-time generation within the human perception range, thus representing a generational leap in application-level capabilities [3][6]. - The Instantaneous Response Engine (IRE) is introduced, which drastically reduces inference time by compressing the sampling steps from over 50 to just 1-4, addressing the computational load effectively [9][11]. Group 2: Model Architecture - The Omni model is a native end-to-end multimodal foundation that allows for the simultaneous processing of various data types, enhancing the model's versatility and efficiency [20][25]. - The model employs a unified token flow architecture based on Transformer, enabling the joint processing of text, images, audio, and video, thus improving the model's understanding of multimodal data [21][25]. - The model's native resolution feature ensures high-quality video generation without compromising the integrity of the visual content, addressing issues related to traditional data preprocessing methods [22][23]. Group 3: Continuous Evolution - PixVerse R1 introduces a self-regressive streaming generation mechanism that allows for theoretically infinite video generation, breaking the constraints of fixed-length outputs [29][32]. - The model incorporates a memory-enhanced attention module that captures and retains key features from the video, optimizing computational efficiency while maintaining long-term consistency [30][32]. - This architecture ensures that the generated content remains coherent and logically consistent, regardless of the length of the video, thus establishing a robust foundation for a universal real-time world model [32][38].
刚刚,喝到了千问APP给我点的奶茶
机器之心· 2026-01-15 04:31
Core Insights - The development of intelligent agents has accelerated significantly at the beginning of 2026, with notable advancements from companies like Anthropic and Alibaba [1][11] - Anthropic's release of Cowork aims to revolutionize the workplace by integrating large models with intelligent agent capabilities for general users, not just programmers [1] - Alibaba's Qianwen App has introduced a new AI Agent feature called "Task Assistant," which integrates with Alibaba's ecosystem to offer over 400 new functionalities for free [2][4] Group 1 - The Qianwen App can automate tasks such as ordering food by simply stating preferences, streamlining the entire process from selection to payment [5][20] - Users can consult the Task Assistant for shopping decisions, which can provide recommendations and direct links to payment [7][9] - The Task Assistant has demonstrated its ability to handle complex tasks like multi-brand group purchases, significantly reducing the time and effort required for users [12][18] Group 2 - The Task Assistant can create detailed travel plans, such as a two-day itinerary for a trip to Weihai, by analyzing user needs and sourcing information from various platforms [22][27] - The assistant integrates with Alibaba's services, allowing users to navigate, book tickets, and manage travel logistics seamlessly [29] - The interaction model has shifted from dialogue with a large model to task delegation to an intelligent agent, marking a significant evolution in user experience [31] Group 3 - Qianwen's Task Assistant is built on a new universal agent system that enhances task execution efficiency and accuracy through a hierarchical planning approach [33] - The system allows for continuous learning and improvement, enabling agents to refine their capabilities based on past experiences [35] - The integration of AI coding capabilities allows the assistant to autonomously generate tools for less common tasks, enhancing its functionality [36] Group 4 - The AI sector is entering a product explosion phase, with new offerings from various companies, including Anthropic and OpenAI, indicating a rapid evolution in intelligent agent applications [38] - Qianwen's launch is compared to the introduction of the first iPhone, suggesting it could signify a transformative moment in the AI landscape [38] - The shift from AI as a distant entity to a practical assistant in daily tasks represents a pivotal change in human-machine interaction [38]