机器之心
Search documents
摩尔线程天使投资人:对近期AI的四十个观察
机器之心· 2025-12-30 12:10
Core Viewpoint - The article discusses the emergence of the AI economy, highlighting its rapid development and the structural changes it brings to various industries and society as a whole [3][4]. Group 1: AI Economic Characteristics - The AI industry is characterized by non-linear and non-uniform growth, with economic activities related to AI advancing at an unprecedented scale while traditional industrial activities maintain their usual pace [3]. - Industry leaders, such as Elon Musk and Jensen Huang, predict significant economic transformations due to AI, including a potential fivefold increase in global GDP to $500 trillion [4]. Group 2: Scaling Law and AI Development - The Scaling Law is a foundational principle for the development of large AI models, with current research focusing on when and under what conditions it will converge [7]. - Key metrics indicate that the reasoning cost of large language models decreases by 90% every 12 months, and their capability doubles approximately every seven months [7]. Group 3: Digital Layer and Economic Impact - The "digital layer" is proposed as a crucial infrastructure for the AI economy, consisting of personal AI assistants and specialized AI agents that enhance understanding of consumers and producers [10][16]. - This digital layer is expected to significantly reduce transaction costs and improve efficiency in economic activities by automating information collection, decision-making, and actions [17][18]. Group 4: Employment and Workforce Changes - The emergence of AI employees is anticipated, with organizations likely to see changes in management, recruitment, and collaboration between human and AI workers [30]. - The shift towards a task-centered work system is expected to enhance economic efficiency by breaking down jobs into smaller, manageable tasks that AI can perform [26]. Group 5: Global Economic Dynamics - The article suggests that the global distribution of GDP will change as AI capabilities become more uniform across countries, potentially altering traditional international divisions of labor [35]. - Countries will need to assess their energy, computing power, data, and algorithm capabilities to effectively integrate AI into their economies [38].
三维空间太难懂?RoboTracer让机器人理解复杂空间指令,推理3D空间轨迹,开放世界也能精确行动
机器之心· 2025-12-30 12:10
本文的主要作者来自北京航空航天大学、北京大学、北京智源人工智能研究院和中科院自动化研究所。本 文的第一作者为北京航空航天大学博士生周恩申,主要研究方向为具身智能和多模态大模型。本文的共一 作者兼项目负责人为北京智源研究院研究员迟程。本文的通讯作者为北京航空航天大学教授盛律和北京大 学计算机学院研究员、助理教授仉尚航。 我们希望具身机器人真正走进真实世界,尤其走进每个人的家里,帮我们完成浇花、收纳、清洁等日常任 务。但家庭环境不像实验室那样干净、单一、可控:物体种类多、摆放杂、随时会变化,这让机器人在三 维物理世界中「看懂并做好」变得更难。 想象一下你下班回到家,对家用服务机器人说: 「按从左到右的顺序给每盆花浇水;喷壶要在每朵花上方 1–5 厘米处停住再浇,这样更均匀。」(如下图) 对人来说这很自然,但对机器人来说,难点不在「浇水」本身,而在指令里隐含了大量空间约束:既有 定 性 的(从左到右、在上方),也有 定量 的(1–5 厘米)。在杂乱的开放世界场景中,让机器人稳定遵循这 些约束,哪怕对目前最先进的视觉 - 语言 - 动作模型(VLA)也依然是挑战。 一个直接的突破口是:让视觉 - 语言模型(VLM)生 ...
自回归因果注意力也能并行解码?上交联合UCSD突破LLM推理瓶颈,模型代码全开源
机器之心· 2025-12-30 06:57
在大语言模型(LLM)落地应用中,推理速度始终是制约效率的核心瓶颈。传统自回归(AR)解码虽能保证生成质量,却需逐 token 串行计算,速度极为缓慢; 扩散型 LLM(dLLMs)虽支持并行解码,却面临训练成本高昂、质量下降及 KV 缓存兼容问题;投机解码(Speculative Decoding)则需额外引入草稿模型,系统 复杂度大增。 Jacobi Forcing 核心优势: 破解并行解码的 "三元悖论" Jacobi Forcing 的创新之处在于打破了 "低代价、高速度、高质量" 的不可能三角,其核心优势体现在三大维度: 近期,来自 UCSD Hao AI Lab 和上海交大 Deng Lab 的团队提出了一种突破性解决方案 ——Jacobi Forcing,该方案无需重构模型架构,即可将标准 AR 模型转化为 原生因果并行解码器,在编码、数学等任务中实现最高 4 倍 wall-clock 提速和 4.5 倍 tokens-per-forward 提升,同时保持接近 AR 模型的生成质量,为 LLM 高效推 理开辟了新路径。 论文地址: https://arxiv.org/pdf/2512.1468 ...
吴恩达年终总结:2025是AI工业时代的黎明
机器之心· 2025-12-30 06:57
Core Insights - 2025 is marked as a pivotal year in the AI industry, characterized by intense competition among AI giants, a talent war, and significant advancements in AI infrastructure and capabilities [6][10][13]. Group 1: AI Development and Learning - The rapid advancement in AI has created unprecedented opportunities for software development, with a notable shortage of skilled AI engineers [6][22]. - Structured learning is essential for aspiring AI developers to avoid redundant efforts and to understand existing solutions in the industry [7][8]. - Practical experience is crucial; hands-on project work enhances understanding and sparks new ideas in AI development [8][14]. Group 2: AI Infrastructure and Investment - The AI industry has seen capital expenditures surpassing $300 billion in 2025, primarily for building new data centers to handle AI tasks [26]. - Major companies are planning extensive infrastructure projects, with projected costs reaching up to $5.2 trillion by 2030 to meet anticipated demand for AI capabilities [26][31]. - Companies like OpenAI, Meta, Microsoft, and Amazon are investing heavily in data center capacities, with OpenAI planning to build 20 gigawatts of data center capacity globally [31]. Group 3: Talent Acquisition and Market Dynamics - A fierce competition for top AI talent has led to unprecedented salary offers, with some companies offering compensation packages comparable to professional sports stars [22][26]. - Meta's aggressive recruitment strategy has included significant financial incentives to attract talent from competitors, reflecting the high market value of AI professionals [22][27]. - Despite concerns about an AI bubble, investments in AI infrastructure are contributing to economic growth, particularly in the U.S. [29]. Group 4: Advancements in AI Models - The introduction of reasoning models has significantly improved the performance of large language models (LLMs), enhancing their capabilities in various tasks [20][21]. - AI agents are increasingly capable of automating complex coding tasks, with reports indicating that many companies are now relying on AI-generated code for senior-level tasks [33][39]. - The evolution of programming agents has led to a competitive landscape among AI companies, with advancements in code generation capabilities becoming a focal point [30][39].
清华朱军团队Nature Machine Intelligence:多模态扩散模型实现心血管信号实时全面监测
机器之心· 2025-12-30 04:06
Core Viewpoint - The article discusses the challenges in obtaining high-quality cardiovascular signals for wearable health monitoring and introduces a new unified multimodal generation framework called UniCardio, which aims to enhance signal denoising, interpolation, and modality translation for AI-assisted medical applications [2][7]. Group 1: Background and Challenges - Cardiovascular diseases are a leading cause of death, and signals like photoplethysmography (PPG), electrocardiography (ECG), and blood pressure (BP) provide different insights into the same physiological processes [3]. - There is a dilemma in monitoring: wearable signals are easy to obtain but prone to noise and interruptions, while high-quality signals require more invasive methods that are less practical for long-term use [3][4]. Group 2: Introduction of UniCardio - UniCardio is designed to perform two core functions: signal restoration (denoising and interpolation of low-quality signals) and modality translation (synthesizing hard-to-obtain signals based on available ones) [7]. - The framework utilizes a unified diffusion model to learn the multimodal conditional distribution relationships among different cardiovascular signals [11]. Group 3: Methodology - UniCardio employs a diffusion model that generates data from noise, using a unified noise mechanism for different modalities and gradually reconstructing target signals under conditional guidance [11]. - It incorporates modality-specific encoders and decoders to extract and restore physiologically meaningful waveform features, while task-specific attention masks are used to constrain information flow relevant to current tasks [13]. Group 4: Training Paradigm - The framework introduces a continual learning paradigm that incrementally incorporates different tasks to ensure sufficient training samples and balance task contributions, addressing the issue of catastrophic forgetting [13]. - This approach facilitates knowledge transfer across tasks and modalities, enhancing performance in more complex generation tasks [13]. Group 5: Experimental Results - UniCardio demonstrates consistent advantages in signal denoising, interpolation, and modality translation compared to task-specific baseline methods, highlighting the value of multimodal complementary information [15]. - In specific tasks, such as PPG and ECG interpolation, the introduction of multimodal conditions significantly reduces generation error and improves waveform recovery stability [16]. Group 6: Application and Validation - The generated signals from UniCardio have been validated in downstream cardiovascular applications, showing superior performance in abnormal state detection and vital sign estimation compared to using noisy or interrupted signals [18]. - The results indicate that UniCardio-generated signals not only resemble real signals numerically but also maintain functional usability for downstream analyses [19]. Group 7: Interpretability and Clinical Relevance - The framework provides a clinically friendly validation path, ensuring that generated signals retain recognizable diagnostic features for clinical experts [21]. - The observable intermediate states during the denoising process enhance the model's interpretability and credibility, making it suitable for integration into real medical workflows [23]. Group 8: Future Prospects - UniCardio advances cardiovascular signal generation from single-task, single-modality approaches to a more unified and scalable framework, with potential applications extending to fields like neuroscience and psychology that rely on multimodal physiological signals [25].
招生 | 港科大(广州)数据科学与分析学域2026-27博士项目申请开放!
机器之心· 2025-12-30 04:06
Core Insights - The article emphasizes the importance of data science and analytics in solving real-world problems and advancing the field through a combination of statistics, machine learning, and optimization techniques [3][11]. Part One: Introduction to Data Science and Analytics - The domain of data science and analytics aims to unify various techniques to advance the discipline and apply them to benefit society [3]. - The doctoral program in data science and analytics provides rigorous research training, enabling students to analyze data and make informed decisions [3]. Part Two: Interdisciplinary Research Directions - The program covers various interdisciplinary research directions, including statistical learning, industrial analytics, machine learning, business modeling, high-performance data analysis, and data visualization [6]. Part Three: Talent Development - The doctoral program is designed to equip students with skills to identify and solve practical research problems, proposing independent solutions related to data science and analytics [11]. - Students are expected to demonstrate mastery of the subject matter and contribute original, meaningful scientific insights [12]. Part Four: Training Model - The program offers full-time and part-time study options, with a duration of 3-4 years for full-time students and 6 years for part-time students [13]. - The program requires a total of 21 credits, including core and elective courses, and degrees are awarded by the Hong Kong University of Science and Technology [13]. Part Five: Learning Outcomes - Graduates will be able to identify relevance and insights in science and engineering, master various models and tools in data science, and demonstrate critical thinking and analytical skills [22]. - They will also be capable of conducting original research and effectively communicating their findings [23]. Part Six: Faculty - The article includes a section on faculty, indicating the expertise available to students in the program [24]. Part Seven: Application Guide - Applicants must hold a recognized bachelor's degree and meet specific English language requirements [27]. - The application timeline for the 2026-27 academic year is outlined, with deadlines for both domestic and international students [28][30].
Manus被收购,智谱也定了8天后上市
机器之心· 2025-12-30 04:06
机器之心编辑部 AI 大新闻,一桩接一桩。 早上刚传来 Manus 被 Meta 收购的消息,很快,围绕「全球大模型第一股」的竞速,也传来靴子落地的声响。 12 月 30 日,北京智谱华章科技股份有限公司(以下简称「智谱」)正式启动港股招股。招股期将持续至 2026 年 1 月 5 日,并计划于 2026 年 1 月 8 日以股票代码 "2513" 在香港联交所主板挂牌上市。 根据招股安排,智谱拟进行全球发售 3741.95 万股 H 股,其中香港公开发售 187.1 万股 H 股,国际发售 3554.85 万股 H 股。 IPO 的定价与募资规模也随之揭晓 —— 每股发行价定为 116.20 港元。在扣除相关发行费用后,预计本次募资规模约 43 亿港元,对应的 IPO 市值预计将超过 511 亿港元。 公开信息显示,智谱在私募市场的累计融资额已达 83.44 亿元,最新估值攀升至 243.77 亿元。这意味着,在迈向上市的关键一跃中,智谱的市值几乎实现翻倍, 如此幅度的「溢价上市」,也是一次难度不低的市场挑战。 基石投资者阵容同样颇为亮眼。公告显示,基石投资者合计拟认购 29.8 亿港元,占本次发行规模近七 ...
港大联合字节跳动提出JoVA: 一种基于联合自注意力的视频-音频联合生成模型
机器之心· 2025-12-29 23:36
作者介绍:本文第一作者黄小虎同学,目前是香港大学的三年级在读博士生,导师是韩锴教授。黄小虎的研究方向是以视频为中心的领域,包括音视频生成、视 频理解以及视频识别。 视频 - 音频联合生成的研究近期在开源与闭源社区都备受关注,其中,如何生成音视频对齐的内容是研究的重点。 近日,来自香港大学和字节跳动的研究团队提出了一种简单有效的框架 ——JoVA,它支持视频和音频的 Token 在一个 Transformer 的注意力模块中直接进行跨模态 交互。为了解决人物说话时的 "口型 - 语音同步" 问题,JoVA 引入了一个基于面部关键点检测的嘴部区域特定损失 (Mouth-area specific loss)。 实验表明,JoVA 只采用了约 190 万条训练数据,便在口型同步准确率、语音质量和整体生成保真度上,达到了先进水平。 项目主页: https://visual-ai.github.io/jova/ 论文地址:https://arxiv.org/abs/2512.13677 一、研究背景与动机 目前的开源解决方案通常分为两大类别:一类是 "级联式",即先生成视频再配音,或者先生成语音再驱动视频生成,这种方式 ...
突发!Meta官宣收购智能体初创公司Manus
机器之心· 2025-12-29 23:36
机器之心编辑部 就在刚刚,Meta 完成了一项大收购,将智能体初创公司 Manus 收入麾下。 目前,双方交易的具体细节(包括具体收购金额等)尚未公布。 自今年 3 月推出全球首款通用 Agent 以来,Manus 迅速走红,成为人工智能领域的一大焦点。据公开资料显示,今年 4 月份,Manus 母公司宣布完成 7500 万美元 融资,估值接近 5 亿美元,投资方包括知名风投机构基准资本在内的多家投资主体。 此后,Manus 总部及核心研发团队搬到了新加坡。 如今,「靴子终于落地」,Manus 被 Meta 收购,迎来了新的发展机遇。 Meta 首席 AI 官 Alexandr Wang 表示,「很高兴 Manus 加入 Meta,帮助我们打造令人惊叹的 AI 产品!Manus 团队在探索当前模型的能力潜力方面处于全球领先地 位,致力于构建强大的智能体。」 Manus 创始人兼 CEO 肖弘发文称,「今天是一个我将终生难忘的时刻。当我们创办 Manus 时,很少有人相信通用 AI 智能体能够成功。我们被告知时机太早,目 标太宏大,挑战太艰难,但我们依然坚持建设。在怀疑、挫折和无数个夜晚的徘徊中,我们曾质疑自己 ...
全景视觉的Depth Anything来了!Insta360推出DAP,200万数据打造全场景360°空间智能新高度
机器之心· 2025-12-29 08:22
在空间智能(Spatial Intelligence)飞速发展的今天,全景视角因其 360° 的环绕覆盖能力,成为了机器人导航、自动驾驶及虚拟现实的核心基石。然而,全景深度 估计长期面临 "数据荒" 与 "模型泛化差" 的瓶颈。 近日, 来自 Insta360 研究团队、加州大学圣地亚哥分校 (UCSD)、武汉大学以及加州大学默塞德分校的研究者 共同推出了 Depth Any Panora mas (DAP) 。这是首 个在大规模多样化数据集上训练的全景度量深度(Metric Depth)基础模型,不仅统一了室内外场景,更通过 200 万量级的数据引擎与创新的几何一致性设计,刷 新了多项 benchmark 纪录,在多种 open-world 场景下保持优异的效果。 模型对由 Gemini 或 DiT-360 等合成的全景图同样展现出了极佳的预测效果,生成的深度图边缘锐利、逻辑自洽,是空间 AIGC 链路中理想的几何基石。 除了静态 图像,DAP 在处理全景视频流时同样展现出了极佳的预测效果,具备优秀的帧间一致性与稳定性 。 论文标题:Depth Any Panoramas: A Foundation Mod ...