机器之心
Search documents
仅需一个混频器的无线射频机器学习推理,登上Science Advances!
机器之心· 2026-01-16 00:42
本文作者包括来自杜克大学的高智辉、陈廷钧教授和 MIT 的 Dirk Englund 教授团队。 高智辉,杜克大学电子与计算机工程系博士生。本科毕业于复旦大 学电子工程系。研究兴趣于下一代网络系统,包括信息物理系统、机器学习加速等。 模型-数据的分解式计算 机器学习部署在边端设备的时候,模型总是存储在云端服务器上(5G 基站),而模型输入输出总是在边端设备上(例如用照相机拍摄照片然后识别其中的 目标)。在这种场景下,传统有以下两种方案完成机器学习的推理: 方案一:上传模型输入到云端。 这种方案需要每个用户分别把自己的模型输入上传到云端,然后在云端完成推理,最后把模型输出下载到各个用户。 这种方案需要消耗大量的带宽资源,尤其是在大用户规模的情形下;其次,这种上传用户模型输出的方案会涉及用户隐私泄露的问题。 方案二:广播模型下载到边端。 这种方案要求是云端服务器把模型广播给所有的用户,每个用户各自存储模型,并且在边缘端进行计算。 这种方案极大挑战了边缘用户的算力,并且在模型存储的过程中还有边端存储读写的开销。 在我们的工作里,我们提出了第三种分离式计算(disaggregated computing)的方案: 广 ...
Mira公司内乱?CTO被开除,带团队回OpenAI,翁荔上推发言
机器之心· 2026-01-15 09:17
今天对于 Thinking Machines Lab 和 OpenAI 来说都是不同寻常的一天。 Thinking Machines Lab 创始人兼 CEO Mira Murati 官宣了 与 联合创始人兼 CTO Barret Zoph 的分道扬镳 。 同时,她也宣布了 新任 CTO 的人选 ——Pytorch 之父 Soumith Chintala 。这位在现代 AI 基础设施领域颇具影响力的研究者在去年 11 月初离开了 Meta,并选择 加入 Thinking Machines Lab。 大约 1 个小时后,OpenAI 应用 CEO Fidji Simo 宣布, Barret Zoph 将重返 OpenAI 。 连同他一起回归 OpenAI 的还有 另一位 Thinking Machines Lab 联合创始人 Luke Metz 以及 创始团队成员 Sam Schoenholz 。 机器之心编辑部 两位联合创始人同时从 Thinking Machines Lab「出走」,这一消息在圈内造成了不小的冲击。 根据有人获悉的内部消息, 此次是由于 Barret Zoph 个人的不道德行为,Thinki ...
通用级PixVerse R1的技术突破,揣着进入平行世界的密码
机器之心· 2026-01-15 09:17
Core Viewpoint - The article discusses the launch of PixVerse R1, a groundbreaking model in video generation that enables real-time, high-quality video creation, marking a significant advancement in the industry [1][3][38]. Group 1: Technological Breakthroughs - PixVerse R1 is the first global model to support real-time generation of 1080P resolution videos, transitioning video generation from static output to real-time interaction [6][35]. - The model achieves a significant increase in computational efficiency, allowing for real-time generation within the human perception range, thus representing a generational leap in application-level capabilities [3][6]. - The Instantaneous Response Engine (IRE) is introduced, which drastically reduces inference time by compressing the sampling steps from over 50 to just 1-4, addressing the computational load effectively [9][11]. Group 2: Model Architecture - The Omni model is a native end-to-end multimodal foundation that allows for the simultaneous processing of various data types, enhancing the model's versatility and efficiency [20][25]. - The model employs a unified token flow architecture based on Transformer, enabling the joint processing of text, images, audio, and video, thus improving the model's understanding of multimodal data [21][25]. - The model's native resolution feature ensures high-quality video generation without compromising the integrity of the visual content, addressing issues related to traditional data preprocessing methods [22][23]. Group 3: Continuous Evolution - PixVerse R1 introduces a self-regressive streaming generation mechanism that allows for theoretically infinite video generation, breaking the constraints of fixed-length outputs [29][32]. - The model incorporates a memory-enhanced attention module that captures and retains key features from the video, optimizing computational efficiency while maintaining long-term consistency [30][32]. - This architecture ensures that the generated content remains coherent and logically consistent, regardless of the length of the video, thus establishing a robust foundation for a universal real-time world model [32][38].
刚刚,喝到了千问APP给我点的奶茶
机器之心· 2026-01-15 04:31
Core Insights - The development of intelligent agents has accelerated significantly at the beginning of 2026, with notable advancements from companies like Anthropic and Alibaba [1][11] - Anthropic's release of Cowork aims to revolutionize the workplace by integrating large models with intelligent agent capabilities for general users, not just programmers [1] - Alibaba's Qianwen App has introduced a new AI Agent feature called "Task Assistant," which integrates with Alibaba's ecosystem to offer over 400 new functionalities for free [2][4] Group 1 - The Qianwen App can automate tasks such as ordering food by simply stating preferences, streamlining the entire process from selection to payment [5][20] - Users can consult the Task Assistant for shopping decisions, which can provide recommendations and direct links to payment [7][9] - The Task Assistant has demonstrated its ability to handle complex tasks like multi-brand group purchases, significantly reducing the time and effort required for users [12][18] Group 2 - The Task Assistant can create detailed travel plans, such as a two-day itinerary for a trip to Weihai, by analyzing user needs and sourcing information from various platforms [22][27] - The assistant integrates with Alibaba's services, allowing users to navigate, book tickets, and manage travel logistics seamlessly [29] - The interaction model has shifted from dialogue with a large model to task delegation to an intelligent agent, marking a significant evolution in user experience [31] Group 3 - Qianwen's Task Assistant is built on a new universal agent system that enhances task execution efficiency and accuracy through a hierarchical planning approach [33] - The system allows for continuous learning and improvement, enabling agents to refine their capabilities based on past experiences [35] - The integration of AI coding capabilities allows the assistant to autonomously generate tools for less common tasks, enhancing its functionality [36] Group 4 - The AI sector is entering a product explosion phase, with new offerings from various companies, including Anthropic and OpenAI, indicating a rapid evolution in intelligent agent applications [38] - Qianwen's launch is compared to the introduction of the first iPhone, suggesting it could signify a transformative moment in the AI landscape [38] - The shift from AI as a distant entity to a practical assistant in daily tasks represents a pivotal change in human-machine interaction [38]
人脸机器人登上Science Robotics封面:用AI教会仿生人脸机器人「开口说话」
机器之心· 2026-01-15 04:31
胡宇航(网名 "U 航"),毕业于美国哥伦比亚大学,博士学位,首形科技创始人。长期专注于机器人自主学习的研究工作。研究成果发表于《Nature Machine Intelligence》,《Science Robotics》等国际顶级期刊。致力于赋予机器人 "自我模型" 能力,即构建对自身物理结构与运动的内部表征,使机器人能够更好地理解 自身,并适应多变的形态、环境与任务。在仿生人机交互方向,他提出融合语音、视觉与动作的情绪理解与表达一体化系统,为机器人提供更加自然的交互能 力。通过自监督学习机制,他的方法使机器人在无需人工干预的情况下不断提升人机互动质量,朝着具备终身学习能力的智能体不断迈进。 论文地址: https://www.science.org/doi/10.1126/scirobotics.adx3017 曾发表论文: 2026 年 1 月 15 日,一项来自美国哥伦比亚大学工程学院的突破性研究正式发表于《Science Robotics》,并登上期刊封面。该研究展示了一项全新的机器人技术: 一台具备仿生面部结构的人形机器人,通过深度学习实现与语音和歌曲同步的真实唇部运动。它能跟着人类的语言精准张 ...
解锁任意步数文生图,港大&Adobe全新Self-E框架学会自我评估
机器之心· 2026-01-15 03:52
Core Viewpoint - The article discusses the introduction of Self-E, a novel text-to-image generation framework that eliminates the need for pre-trained teacher models and allows for any-step generation while maintaining high quality and semantic clarity [2][28]. Group 1: Introduction and Background - Traditional diffusion models and flow matching have improved text-to-image generation but require numerous iterations, limiting their real-time application [2]. - Existing methods often rely on knowledge distillation, which incurs additional training costs and leaves a gap between "from scratch" training and "few-step high quality" generation [2][28]. Group 2: Self-E Framework - Self-E represents a paradigm shift by focusing on "landing evaluation" rather than "trajectory matching," allowing the model to learn the quality of the final output rather than just the correctness of each step [7][28]. - The model operates in two modes: learning from real data and self-evaluating its generated samples, creating a self-feedback loop [12][13]. Group 3: Training Mechanism - Self-E employs two complementary training signals: one from data and the other from self-evaluation, enabling the model to learn local structures and assess its outputs simultaneously [14][19]. - The training process involves a long-distance jump to a landing point, where the model uses its current local estimates to generate feedback on how to improve the output [17][19]. Group 4: Inference and Performance - During inference, Self-E can maintain semantic and structural quality with very few steps, and as the number of steps increases, the quality continues to improve [22][23]. - In the GenEval benchmark, Self-E outperforms other methods across all step counts, showing a significant advantage in the few-step range, with a notable improvement of +0.12 in a 2-step setting compared to the best existing methods [24][25]. Group 5: Broader Implications - Self-E's approach aligns pre-training and feedback learning, creating a closed-loop system similar to reinforcement learning, which enhances the model's ability to generate high-quality outputs with fewer steps [26][29]. - The framework allows for dynamic step selection based on the application context, making it versatile for both real-time feedback and high-quality offline rendering [28].
实测夸克「千问划词快捷指令」,这7个邪修Prompt,建议收藏
机器之心· 2026-01-15 03:52
Core Viewpoint - The article discusses the challenges and solutions related to effectively using AI for understanding complex information, emphasizing the importance of well-structured prompts to enhance AI responses [6][8]. Group 1: AI Interaction Challenges - Many users struggle to communicate effectively with AI, leading to frustration and doubts about AI's intelligence [5][6]. - The quality of AI responses often depends on the clarity and structure of the user's prompts, suggesting that refined instructions can significantly improve outcomes [6][10]. Group 2: Quark AI Browser Features - The Quark AI Browser has introduced a feature called "Thousand Questions Highlighting," which allows users to create custom shortcut commands for frequently used prompts, streamlining the interaction process [8][10]. - Users can set up specific commands for tasks like translation and content optimization, making it easier to achieve precise results without repetitive input [11][12]. Group 3: Practical Applications of AI Prompts - The article highlights various effective prompt categories, such as "Evil Master Prompts" that encourage AI to ask for necessary information to fulfill tasks effectively [15][16]. - A "Human Language Translator" prompt is suggested for simplifying complex academic papers, allowing users to receive clear explanations [25][27]. - The "Citation Source Finder" prompt aids in quickly identifying relevant research sources, significantly reducing the time spent on literature review [30][33]. Group 4: Content Creation Enhancements - Content creators can utilize tailored prompts for different platforms, ensuring that the tone and style match the audience's preferences, thus enhancing engagement [35][39]. - Specific prompts for platforms like Xiaohongshu and Weibo are provided, demonstrating how to adapt content for various social media environments [39][42]. Group 5: Future of AI Browsers - The Quark AI Browser aims to evolve into a comprehensive application, integrating various AI models and supporting multimodal inputs, which enhances user experience and functionality [45][46]. - The browser's capabilities are designed to create a seamless workflow for users, enabling them to perform tasks more efficiently and effectively [48][51].
已证实!清华姚班陈立杰全职加入OpenAI,保留伯克利教职
机器之心· 2026-01-15 03:52
机器之心编辑部 据机器之心求证,清华大学「姚班」校友、加州大学伯克利分校(UC Berkeley)助理教授 陈立杰(Lijie Chen)已正式加入 OpenAI 。 知情人士透露,陈立杰此次是以 全职 身份加入 OpenAI 开展研究工作。与此同时,他目前在伯克利的状态为 On Leave(停薪留职),即他保留了在大学 的教职,并未离职。 陈立杰是理论计算机科学领域的顶尖青年学者,本科毕业于清华姚班,博士毕业于麻省理工学院(MIT),在计算复杂性理论等领域拥有卓越的学术成就。 截至目前,其个人主页和 LinkedIn 页面尚未更新。 从 IOI 金牌到伯克利助理教授 陈立杰高中就读于杭州外国语学校。他在信息学竞赛(OI)领域表现突出,是当时知名的竞赛选手。 2011 年,他获得全国青少年信息学奥林匹克竞赛(NOI)金牌;2013 年,他代表中国队出征第 25 届国际信息学奥林匹克竞赛(IOI),不仅夺得金牌, 更取得了全球第一名的成绩。 进入清华大学姚班后,陈立杰逐渐将重心从程序设计竞赛转向计算机科学理论研究。2016 年,他获得清华大学本科生特等奖学金。在特等奖学金答辩会 上,陈立杰曾立下宏愿:「 有生之 ...
Agent时代,为什么多模态数据湖是必选项?
机器之心· 2026-01-15 00:53
Core Viewpoint - The year 2025 is anticipated to be remembered as the dawn of the AI industrial era, with many companies racing to invest in AI applications and agent development, but the true competition lies beyond just application-level advancements [1][4]. Group 1: AI Infrastructure and Data Management - The AI era emphasizes that the foundation for AI applications is robust data infrastructure, which is crucial for building true competitive advantages for companies [3][8]. - Companies need to develop capabilities to handle multimodal data, as the real benefits of the AI era lie not in merely possessing state-of-the-art models but in the ability to continuously manage and nurture them [9][18]. - The industry is entering the "second half" of AI, where the focus shifts to how AI should be utilized and how to measure real progress, necessitating a change in mindset to leverage AI thinking [4][5]. Group 2: Multimodal Data Lakes - The construction of multimodal data lakes is becoming essential for companies to participate in the agent competition, as it allows for the transformation of previously dormant unstructured data into usable competitive assets [14][21]. - IDC predicts that by 2025, over 80% of enterprise data will be unstructured, highlighting the need to awaken this data to build competitive strength in the agent era [16][19]. - The transition from traditional data lakes to multimodal data lakes is critical, as it enables companies to manage and utilize diverse data types effectively, driving business intelligence and operational efficiency [12][22]. Group 3: Data Infrastructure Evolution - The evolution of data infrastructure is outlined in three progressive stages: overcoming computing bottlenecks, integrating models into data pipelines, and implementing comprehensive data governance [30][31][33]. - The first stage focuses on breaking through computing limitations by adopting heterogeneous architectures that support both CPU and GPU, ensuring data can be processed quickly and efficiently [30]. - The second stage emphasizes the integration of pre-trained large models into data workflows, allowing for the automatic conversion of multimodal data into usable formats for AI applications [31][32]. - The final stage aims for unified data governance, enhancing the management and activation of data assets while ensuring compliance and security [33][34]. Group 4: Strategic Recommendations for Companies - Companies should prioritize transforming their data infrastructure from a "storage center" to a "value center," ensuring that data can be quickly accessed and understood by AI models [38][39]. - The focus should be on practical business applications, avoiding the pitfalls of excessive computational power that does not translate into business value [40][41]. - A modular and open data infrastructure is essential for adapting to future uncertainties, allowing companies to upgrade smoothly as technologies evolve [43][44][45]. Group 5: Industry Applications and Impact - The implementation of multimodal data lakes has shown significant improvements across various industries, such as a 20-fold performance increase in a smart driving company's model training and a 90% efficiency boost in content production for a leading media company [51][59]. - These examples illustrate the necessity of adopting multimodal data strategies to unlock the potential for intelligent transformation across diverse sectors [52][56].
大模型长脑子了?研究发现LLM中层会自发模拟人脑进化
机器之心· 2026-01-15 00:53
Core Insights - The article discusses the emergence of a "Synergistic Core" structure in large language models (LLMs), which is similar to the human brain's organization [1][2][17]. - The research indicates that this structure is not inherent to the Transformer architecture but develops through the learning process [18][19]. Model Analysis - Researchers utilized the Partial Information Decomposition (PID) framework to analyze models such as Gemma, Llama, Qwen, and DeepSeek, revealing strong synergistic processing capabilities in the middle layers, while lower and upper layers exhibited redundancy [5][6][8]. - The study involved cognitive tasks across six categories, with models generating responses that were analyzed for activation values [9][10]. Experimental Methodology - The Integrated Information Decomposition (ID) framework was applied to quantify interactions between attention heads, leading to the development of the Synergy-Redundancy Rank, which indicates whether components are aggregating signals independently or integrating them deeply [12][13]. Findings on Spatial Distribution - The experiments revealed a consistent "inverted U-shape" curve in the distribution of synergy across different model architectures, indicating a common organizational pattern [14]. - This pattern suggests that synergistic processing may be a computational necessity for achieving advanced intelligence, paralleling the human brain's structure [17]. Core Structure Characteristics - The "Redundant Periphery" consists of early and late layers with low synergy, focusing on basic tasks, while the "Synergistic Core" in the middle layers shows high synergy, crucial for advanced semantic integration and reasoning [21][23]. - The Synergistic Core is identified as a hallmark of the model's capabilities, exhibiting high global efficiency for rapid information integration [23]. Validation of Synergistic Core - Ablation experiments demonstrated that removing high-synergy nodes led to significant performance declines, confirming the Synergistic Core as a driving force behind model intelligence [25]. - Fine-tuning experiments showed that training focused on the Synergistic Core resulted in greater performance improvements compared to training on redundant nodes [27]. Implications for AI and Neuroscience - Identifying the Synergistic Core can aid in designing more efficient compression algorithms and targeted parameter updates to accelerate training [29]. - The findings suggest a convergence in the organizational patterns of large models and biological brains, providing insights into the nature of general intelligence [29].