Workflow
机器之心
icon
Search documents
特斯拉FSD首次横穿美国,Model3实现1万英里零干预,马斯克预言兑现了
机器之心· 2026-01-01 04:33
在 2025 年最后一天,一个名为 David Moss 的小哥完成了一项壮举: 成功实现世界上首次美国西海岸到东海岸的全自动驾驶之旅,同时也成为世界上第一个连续驾驶特斯拉 FSD 行驶 10000 英里的人。 机器之心编辑部 整个过程零干预,甚至包括所有停车和在 Tesla 超级充电站充电的环节都没有人为接管。 一起完成壮举的搭档 他开着一辆搭载 FSD V14.2 的 2025 款 Model 3,从洛杉矶的 Tesla Diner 出发,历时 2 天 20 小时,走了 2732.4 英里(约 4400 公里),最终抵达南卡罗来纳州的 Myrtle Beach。 他特别强调这是完全靠 FSD 完成的,并且数据可以通过 Whole Mars 的 FSD 数据库公开验证。 你可在这里追踪他的 FSD 里程 https://fsddb.com/profile/DavidMoss ,数据显示,他使用 FSD 在 Model 3 上行驶了 1 万多英里,从未亲自驾驶过车辆。 并列出了大约 30 个超级充电站的几乎完整的停留信息: | Dec 30, 2025 · 11:09 PM | | | --- | --- | ...
英伟达、AMD本月起或涨价,5090两千美元变五千
机器之心· 2026-01-01 03:42
Core Viewpoint - The price increase of GPUs by Nvidia and AMD is becoming a certainty, with expected adjustments starting in early 2026 [1][3]. Group 1: Price Increase Details - Nvidia and AMD plan to gradually raise prices for their GPUs in the coming months, with AMD starting in January and Nvidia in February [3]. - The price hike will initially affect consumer-grade GPUs, such as Nvidia's GeForce RTX 50 series and AMD's Radeon RX 9000 series, with the flagship RTX 5090 expected to rise from an official price of $1999 to around $5000 this year [4][6]. Group 2: Cost Structure and Drivers - The primary driver for the price increase is the rapid growth of memory costs within the GPU cost structure, with memory now accounting for over 80% of the overall manufacturing cost [7][8]. - The procurement cost of the 16GB GDDR7 memory used in the RTX 5070 Ti has surged from $65-80 in May 2025 to $210-260 by December 2025, complicating the maintenance of current GPU prices [8]. Group 3: Impact on AI and Other Products - The price increase will likely extend across all product lines, including GPUs used in AI data centers and servers, as new contracts signed in 2026 will reflect the increased memory prices [6][9]. - The flagship AI GPU H200 from Nvidia, priced between $30,000 and $40,000, is also expected to see further price increases this year due to rising memory costs [9]. Group 4: Market Reactions - Asus has announced a price increase for some products starting January 5, citing rising DRAM and storage costs driven by AI demand [10]. - Dell has previously indicated a price increase of 30%, reflecting similar market conditions [14].
AAAI 2026 Oral | 给多流数据配「私教+外援」,漂移来了也不慌
机器之心· 2026-01-01 03:42
本文作者为:En Yu, Jie Lu, Kun Wang, Xiaoyu Yang, Guangquan Zhang。所有作者均来自于悉尼科技大学(UTS)澳大利亚人工智能研究院(AAII)。 在智慧城市、社交媒体、工业物联网等真实开放动态环境中,数据往往以多流(Multistream)形式并发产生。然而,现实世界并非完美的实验室,这些数据流往往 存在异构性,且分布变化各不相同,伴随着复杂的异步概念漂移。 如何让模型既能 "专精" 于单一流的特性,又能 "博采众长" 利用流间相关性,同时还能自适应分布变化? 悉尼科技大学(UTS)研究团队提出了一种全新的 漂移感知协作辅助混合专家学习框架 —— CAMEL (Collaborative Assistance Mixture of Experts Learning)。 CAMEL 巧妙地将混合专家模型(MoE)引入流式学习,通过 "私有专家" 与 "辅助专家" 的协作机制,以及自动化专家生命周期管理,完美解决了异构多流学习中 的关键问题。该工作已被 AAAI 2026 接收为 Oral 论文。 01 引言 在真实应用场景中,数据通常以连续且无限的数据流形式产生 ...
「视频世界模型」新突破:AI连续生成5分钟,画面也不崩
机器之心· 2025-12-31 09:31
Core Insights - The article discusses the emergence of AI-generated videos and the challenges of creating videos that not only look realistic but also adhere to the laws of the physical world, which is the focus of the "Video World Model" [2] - The LongVie 2 framework is introduced as a solution to generate high-fidelity, controllable videos lasting up to 5 minutes, addressing the limitations of existing models [2][6] Group 1: Challenges in Current Video Models - Current video world models face a common issue where increasing generation length leads to a decline in controllability, visual fidelity, and temporal consistency [6] - The degradation of quality in long video generation is nearly unavoidable, with issues such as visual degradation and logical inconsistencies becoming significant bottlenecks [2][12] Group 2: LongVie 2 Framework - LongVie 2 employs a three-stage progressive training strategy to enhance controllability, stability, and temporal consistency [9][14] - Stage 1 focuses on Dense & Sparse multimodal control, utilizing dense signals (like depth maps) and sparse signals (like keypoint trajectories) to provide stable and interpretable world constraints [9] - Stage 2 introduces degradation-aware training, where the model learns to maintain stability in generation despite imperfect inputs, significantly improving long-term visual fidelity [13] - Stage 3 incorporates historical context modeling, explicitly integrating information from previous segments to ensure smoother transitions and reduce semantic breaks [14] Group 3: Performance Metrics - LongVie 2 demonstrates superior controllability compared to existing methods, achieving state-of-the-art (SOTA) levels in various metrics [21][29] - Ablation studies validate the effectiveness of the three-stage training approach, showing improvements in quality, controllability, and temporal consistency across multiple indicators [26] Group 4: LongVGenBench - The article introduces LongVGenBench, the first standardized benchmark dataset designed for controllable long video generation, containing 100 high-resolution videos over 1 minute in length [28] - This benchmark aims to facilitate systematic research and fair evaluation in the field of long video generation [28]
超DeepEP两倍!无问芯穹FUSCO以「空中变阵」突破MoE通信瓶颈,专为Agent爆发设计
机器之心· 2025-12-31 09:31
随着 ChatGPT、Gemini、DeepSeek-V3、Kimi-K2 等主流大模型纷纷采用混合专家架构(Mixture-of-Experts, MoE)及专家并行策略(Expert Parallelism, EP),MoE 技术已在产业应用中逐渐成为主流。 机器之心发布 MoE 模型因其结构上的稀疏性与专家并行特性,天然引入了频繁且规模庞大的全局分布式数据交换。而 当前主流通信库及解决方案(如 DeepEP) 仍基于 "通信 与数据布局解耦" 的传统设计假设,难以高效应对实际生产中的跨设备、非连续、动态重排的数据访问模式,在高并发、长上下文与大规模专家配置的场景下, DeepEP 性能已逐渐趋近瓶颈,直接制约了 MoE 大模型的持续落地、系统稳定扩展与经济性运行。 与此同时,以代码智能体、Cursor 类对话式 IDE 为代表的新型应用, 一方 面 显著推高了用户请求规模,另一方面大幅拉长了单次推理的上下文长度,两者均呈现 出一个数量级以上的增长 。在 MoE 架构下,这种变化不仅线性放大了计算开销,还显著增加了跨专家的通信与调度成本,使得整体系统压力接近一个数量级提 升,并在规模化服务场景中进一步被放 ...
刚刚,稚晖君发布的人形机器人Q1,小到能塞进书包
机器之心· 2025-12-31 08:11
Core Viewpoint - The article discusses the launch of Q1, the world's first small-sized humanoid robot by Zhiyuan Robotics, which aims to redefine personal robotics by combining full-size robot capabilities with a compact design [1][4][24]. Group 1: Product Features and Innovations - Q1 is designed to maintain the capabilities of full-sized humanoid robots while significantly reducing research costs and physical interaction barriers [6][15]. - The robot features whole-body control (WBC), allowing it to coordinate multiple degrees of freedom for precise task execution [15][22]. - Q1 utilizes a modular hardware design, enabling easy replacement of parts and user customization through 3D printing [12][8]. Group 2: Market Position and Strategy - Zhiyuan Robotics targets both academic research teams and the hardcore hobbyist market, providing open development tools and interfaces [8][24]. - The company has rapidly increased its valuation to 15 billion RMB within three years and has made strategic moves, including acquiring a controlling stake in a listed company to pivot towards robotics [24][26]. - The launch of Q1 is expected to make humanoid robots more accessible to ordinary users, expanding the market for personal robotics [27][28]. Group 3: Technical Challenges and Achievements - The development of small-sized humanoid robots like Q1 presents significant challenges, including high integration requirements and advanced manufacturing processes [20][21]. - Q1's QDD joints represent a breakthrough in miniaturization, achieving high torque density in a compact form, which enhances its performance [18][22]. - The company has reached a milestone of 5,000 units produced, indicating strong demand and successful scaling of production [26].
7B扩散语言模型单样例1000+ tokens/s!上交大联合华为推出LoPA
机器之心· 2025-12-31 08:11
Core Insights - The article discusses a breakthrough in the field of diffusion large language models (dLLMs) through a new decoding algorithm called LoPA (Lookahead Parallel Decoding), which significantly enhances inference speed and parallelism [2][3][36]. Group 1: LoPA Algorithm Features - LoPA achieves a high degree of parallelism, increasing the tokens generated per step (TPF) from 3.1 to 10.1, thus surpassing traditional methods [3][7]. - The algorithm is plug-and-play, requiring no retraining or fine-tuning of the model [8]. - It introduces a lookahead parallel decoding mechanism that actively explores different token filling orders to avoid local optima [9]. - The accompanying LoPA-Dist system maximizes hardware utilization by supporting both CUDA and Ascend platforms [10]. Group 2: Performance Metrics - LoPA has demonstrated a single-sample throughput of 1073.9 tokens/s on the Huawei Ascend 910C platform, significantly outperforming baseline models [3][33]. - In experiments, LoPA integrated with D2F-Dream achieved a TPF of 10.1 on the GSM8K benchmark, drastically reducing the total inference steps [28][31]. - The system's performance metrics indicate that it can effectively convert algorithmic parallelism into substantial real-time acceleration, achieving over 1000 tokens/s on dedicated engines [34]. Group 3: System Design and Optimization - The LoPA-Dist distributed inference system employs a new branch parallelism strategy, which can be combined with existing tensor parallelism methods [25]. - It is optimized for different hardware platforms, with LoPA-Dist-NV designed for low-latency scenarios and LoPA-Dist-Ascend aimed at high-throughput service environments [26]. Group 4: Future Directions - The team plans to explore the application of LoPA in other dLLM architectures, such as SDAR, to further advance efficient generative models [36].
重塑语音安全!上海交大联合宇生月伴,研发高性能高泛化语音鉴伪大模型
机器之心· 2025-12-31 04:09
在生成式 AI 技术日新月异的背景下,合成语音的逼真度已达到真假难辨的水平,随之而来的语音欺诈与信息伪造风险也愈演愈烈。作为应对手段,语音鉴 伪技术已成为信息安全领域的研究重心。 然而,当前的语音鉴伪模型正面临严峻的「泛化性挑战」:许多在特定实验室数据集上表现优秀的模型,在面对现实世界中从未见过的生成算法时,检测性 能往往会出现剧烈下滑。这种「泛化瓶颈」严重限制了鉴伪技术在复杂多变的真实场景中的应用价值。 针对这一难题,上海交通大学听觉认知与计算声学实验室和宇生月伴公司(VUI Labs)联合发表了最新研究成果,提出了一种以数据为中心的研究范式。 该研究深入探究了训练数据分布与模型泛化能力之间的底层逻辑,通过系统性的实证研究与策略优化,构建了兼具高性能与高泛化性的语音鉴伪大模型。 基于上述视角,论文旨在通过系统性的实证分析探索两个核心问题: 规模定律: 论文标题: A Data-Centric Approach to Generalizable Speech Deepfake Detection 论文链接: https://arxiv.org/pdf/2512.18210 核心视角: 从单一构建到多源聚合 不 ...
视远 · 正心明智——「AI 中国」机器之心2025年度评选正式揭晓
机器之心· 2025-12-31 04:09
Core Insights - The article emphasizes the rapid evolution of large models in 2025, highlighting advancements in model architecture, training paradigms, and inference strategies, pushing the boundaries of technology [3] - It notes the emergence of next-generation models like GPT-5 and Gemini 3, which enhance core capabilities in understanding, generation, and reasoning, making the contours of general intelligence clearer [4] - The article stresses the importance of identifying AI technologies that provide long-term value, focusing on their ability to reshape production methods and establish foundational capabilities over time [4][5] Industry Developments - The domestic AI landscape in 2025 is described as vibrant, with Chinese large models closing the gap with international leaders and even surpassing them in certain areas, while also accelerating in open-source, engineering, and application adaptation [4] - The article presents the "AI China" Machine Heart 2025 Annual Selection, which records the advancements in Chinese artificial intelligence and outlines a promising future for technological innovation [6] Rankings and Recognitions - The article announces the top 10 companies/institutions with the strongest technical capabilities in AI for 2025 [7] - It lists the top 20 leading AI enterprises, showcasing the key players in the industry [11][13] - The best large models and large model products are also recognized, with a detailed list of the top 20 in each category [16][20]
NUS尤洋教授深度探讨智能增长的瓶颈:或许我们将这样实现AGI?
机器之心· 2025-12-31 04:09
Core Insights - The essence of intelligent growth is not about architectural changes but how computational power translates into intelligence [6][7] - The current paradigm (Transformer + massive computational power) faces a bottleneck in fully utilizing the increasing computational resources, leading to diminishing returns on pre-training [6][8] - Future directions should focus on breakthroughs in foundational paradigms rather than mere engineering optimizations [8][9] Group 1: Current State of Intelligence - There is no clear definition of intelligence, and even top experts struggle to define AGI (Artificial General Intelligence) [15][16] - The core of intelligence is seen as prediction and creation, with significant advancements needed to approach AGI [17][18] Group 2: Bottlenecks in Intelligent Development - The main source of bottlenecks in intelligent growth is the inefficiency in converting computational power into usable intelligence [19][20] - Pre-training is the most significant contributor to model intelligence, consuming the most computational resources [20][21] - The current model architectures, particularly Transformers, are unable to fully leverage the continuous growth in computational power [33] Group 3: Future Directions - There is a need for higher precision computing and more advanced optimizers to enhance model intelligence [45] - The exploration of scalable model architectures and loss functions is crucial for better utilization of computational resources [45] - The industry must find ways to "consume" more energy in a unit of time and effectively convert it into intelligence [42][45]