Workflow
机器之心
icon
Search documents
AI5芯片搞定,马斯克的纯自研超算Dojo 3又回来了
机器之心· 2026-01-21 04:15
Core Viewpoint - Elon Musk announced significant progress on the AI5 chip design and the restart of the Dojo 3 project, which is crucial for Tesla's AI and autonomous driving initiatives [1][4]. Group 1: Dojo Project Overview - The Dojo project was first introduced during Tesla's AI Day in 2021, aimed at creating a supercomputer for machine learning training to process data from Tesla vehicles [1]. - In July 2023, Dojo officially went into production, but it was previously halted in August 2022 due to inefficiencies in managing two different chip systems [4][5]. - Musk clarified that the decision to pause Dojo was strategic, focusing resources on the more critical AI5 chip, which is essential for Tesla's future projects [4][5]. Group 2: AI5 Chip Significance - The AI5 chip is vital for Tesla's Full Self-Driving (FSD), Cybercab, and Optimus projects, with Musk stating that its success is crucial for the company's future [5][10]. - The AI5 chip is expected to provide a 50-fold improvement over the previous AI4 chip, with production targeted for 2027 [5][11]. Group 3: Dojo 3 Developments - Dojo 3 aims to integrate 512 AI5 or AI6 chips onto a single motherboard, significantly reducing complexity and cost while maintaining high parallel computing capabilities [8][9]. - The new architecture will allow the same chip to handle both training and inference tasks, aligning with Tesla's strategy to reduce reliance on external GPU suppliers like NVIDIA [9][10]. - The Dojo 3 project is expected to accelerate the iteration of Tesla's FSD neural network models and support the training of the Optimus robot's control systems [10]. Group 4: Strategic Partnerships - Tesla has signed a $16.5 billion agreement with Samsung Electronics for the production of AI6 chips, which will bolster the scalability of the Dojo 3 project [12].
推翻150年数学直觉:数学家烧坏几台笔记本,解决几何拓扑难题
机器之心· 2026-01-21 04:15
机器之心编译 这是一次数学理论与计算机算力结合的胜利。 设想一下,如果我们的天空总是被一层厚厚的不透明云层所遮蔽,既看不见星星,也无法从上方俯瞰我们的星球,我们还能发现地球是圆的吗? 答案是肯定的 。通过测量地面上特定的距离和角度,我们就能确定地球是一个球体,而不是平面或者甜甜圈状。即使没有卫星照片也能做到。 。 数学家们发现,这种情况在更普遍的二维曲面中也经常成立: 只需要曲面上相对少量的局部信息,就足以推断出其整体形态,也就是由局部唯一确定整体 。 然而在某些例外情况下,这些有限的局部信息可能对应着不止一种曲面。在过去的 150 年里,数学家们一直在致力于整理这些特例: 即那些通常只能定义一种曲 面,实际上却描述了多种曲面的局部测量数据 。但他们能找到的唯一例外并不是像球体或甜甜圈那样规整、封闭的曲面。相反,这些曲面要么向某个方向无限延 伸,要么拥有某种「边缘」。 没有人能找到一个打破这一规律的封闭曲面,似乎根本就不存在这样的特例。也许,这类曲面总是可以通过常规的局部信息被唯一确定。 如今,数学家们终于发现了一个寻觅已久的特例。在去年 10 月发表的一篇论文中,三位研究人员,包括柏林工业大学的 Alexa ...
R1一周年,DeepSeek Model 1悄然现身
机器之心· 2026-01-21 00:32
Core Insights - DeepSeek officially launched the DeepSeek-R1 model on January 20, 2025, marking the beginning of a new era for open-source LLMs, with DeepSeek-R1 being the most praised model on the Hugging Face platform [2] - A new model named Model1 has emerged in DeepSeek's FlashMLA code repository, attracting significant attention from the online community [5] - Analysis suggests that Model1 is likely the internal development code name or the first engineering version of DeepSeek's next flagship model, DeepSeek-V4 [9] Technical Details - The core architecture of Model1 has reverted to a 512-dimensional standard, indicating a potential optimization for alignment with NVIDIA's next-generation Blackwell (SM100) architecture [9] - Model1 introduces a "Token-level Sparse MLA" as a significant evolution in operators compared to the V3 series, along with new mechanisms such as Value Vector Position Awareness (VVPA) and Engram [11][12] - Performance benchmarks show that the currently unoptimized Sparse MLA operator can achieve 350 TFlops on the B200, while the Dense MLA can reach 660 TFlops on the H800 (SM90a) [10] Architectural Changes - The transition from the previous V32 model, which utilized a non-symmetric MLA design, to a standardized 512-dimensional configuration in Model1 suggests a strategic shift in DeepSeek's architectural approach [9] - The codebase includes optimizations specifically for the Blackwell GPU architecture, indicating a focus on enhancing computational efficiency [9] - The introduction of FP8 KV Cache mixed precision in Sparse operators aims to reduce memory pressure and improve speed in long-context scenarios [12]
AAAI 2026 Oral | 告别注意力与热传导!北大清华提出WaveFormer,首创波动方程建模视觉
机器之心· 2026-01-21 00:32
"全局交互" 几乎等同于 self-attention:每个 token 都能和所有 token 对话,效果强,但代价也直观 —— 复杂度随 token 数平方增长,分辨率一高就吃不消。现有方 法大多从 "相似度匹配" 出发(attention),或从 "扩散 / 传导" 出发(热方程类方法)。但热方程本质上是一个强低通滤波器:随着传播时间增加,高频细节(边 缘、纹理)会迅速消失,导致特征过平滑。 我们是否能找到一种既能实现全局交互,又能精准保留高频细节的物理建模方式? 来自 北京大学 和清华大学 的研究团队 给出了答案: 波动方程(Wave Equation) :把特征图当作空间信号,让语义在网络深度对应的 "传播时间" 里,遵循欠阻 尼波动方程演化。这样一来,低频的全局结构与高频的边缘纹理不再是 "此消彼长" 的牺牲关系,而可以在可控的波动传播中共同存在。在 AAAI 2026 Oral 论文 《WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation》中,研究者首次将视觉特征图视为在波动传播时间下演化的空间信号,受欠阻 ...
马斯克刚刚真把 𝕏 平台推荐算法给开源了,核心也是Transformer
机器之心· 2026-01-20 11:24
刚刚, 平台(原 Twitter 平台)公布了全新的开源消息: 已将全新的推荐算法开源,该算法 由与 xAI 的 G rok 模型相同的 Transformer 架构驱动。 该模型预测用户行为(点赞、回复、转发等)来对帖子进行排序,出现在 For You 一栏中。 编辑|冷猫 众所周知,推荐算法是社交媒体平台的生命线,几乎已经成为了媒体平台获取用户留存,扩大营销收益的核心。在一周多前,马斯克在 平台发推声明「将在 7 天 后开源平台推荐算法」的时候几乎令人难以置信。 而马斯克确实说到做到,虽然比声称的 7 天内略晚,但推荐算法的确已经完全开源。希望之后能够长期遵循每 4 周重复更新的承诺。 在开源信息发布后,马斯克表示:「我们知道这个算法很笨拙,需要大量的改进,但至少你可以看到我们在实时和透明的情况下努力让它变得更好。 没有其他社交 媒体公司这样做。 」 不过,马斯克选择开源 平台推荐算法可能另有原因。 据路透社报道,2025 年 7 月,巴黎检察官调查了该社交媒体平台,怀疑其存在算法偏见和欺诈性数据提取,马斯克将其称为「政治动机的刑事调查」,威胁到其用 户的言论自由。 12 月,欧盟对 处以 1.2 亿欧元 ...
刚刚,MiniMax来承包你的桌面了
机器之心· 2026-01-20 11:24
Core Viewpoint - The article emphasizes that 2026 is set to be a pivotal year for AI agents, with intense competition emerging in the market, particularly highlighted by the launch of MiniMax Agent 2.0, which aims to enhance productivity through advanced AI capabilities [1][2]. Group 1: MiniMax Agent 2.0 Features - MiniMax Agent 2.0 is introduced as an "AI-native Workspace," significantly restructuring its product capabilities to provide a more integrated and efficient user experience [2][5]. - The new desktop application allows seamless interaction with local files and cloud tasks, liberating users from repetitive tasks like switching between windows and manual data entry [2][9]. - The introduction of "Expert Agents" enhances the system's reliability and expertise, improving the quality of outputs from a score of 70 to potentially 95 or 100 [3][5]. Group 2: User Experience and Performance - Users can currently experience the Expert Agents feature for free on both desktop and web platforms, facilitating easier access to advanced functionalities [4]. - The article showcases practical tests where MiniMax Agent successfully completed complex tasks, such as summarizing news and analyzing technical documents, demonstrating its efficiency and capability [11][18]. - The ability to process multiple documents and generate presentations in a fraction of the time previously required illustrates the significant productivity gains offered by MiniMax Agent [18][20]. Group 3: Technological Advancements - MiniMax's continuous model upgrades, including the implementation of Lightning Attention and the M2 architecture, have enhanced the agent's capabilities, allowing it to handle complex tasks more effectively [32][33]. - The integration of these advanced models into MiniMax's internal processes has created a feedback loop that continuously refines the system based on real-world usage [32][33]. - The shift in interaction logic from users adapting to agents to agents adapting to users marks a significant evolution in how AI can assist in high-complexity tasks [33].
当黄仁勋将存储定义为「AI运行内存」,基础设施该如何实现物种进化?
机器之心· 2026-01-20 10:19
Core Insights - The article discusses the unprecedented demand for DRAM and storage solutions driven by AI computing needs, highlighting a significant structural shortage in the global memory market [2][4] - XSKY, a company that has evolved into a leader in China's object storage market, is addressing the challenges posed by AI infrastructure through its AIMesh product strategy, which aims to transform data centers into AI factories [5][10] Group 1: Market Dynamics - The global DRAM wafer demand is projected to reach approximately 40% of the total global DRAM wafer capacity due to agreements between OpenAI and major suppliers like Samsung and SK Hynix [2] - Major tech companies, including Microsoft and Google, are actively negotiating for more DRAM and high-bandwidth memory (HBM) supplies to meet their AI needs [2] - NVIDIA's CEO Jensen Huang predicts that the market for AI-related data storage will become one of the largest globally, necessitating a fundamental restructuring of storage technology [3][4] Group 2: XSKY's Strategic Positioning - XSKY has achieved over 50% growth in the past three years and has significantly increased its all-flash storage ratio to 35% [8] - The company has established 280 superclusters with over 10 PB capacity, demonstrating its capability to handle large-scale storage demands [8] - XSKY's AIMesh strategy focuses on creating a neutral and open data foundation to facilitate the efficient transformation of proprietary data into intelligence [10][36] Group 3: Technological Innovations - XSKY's AIMesh solution aims to overcome three major efficiency barriers in AI: IO wall, gravity wall, and memory wall [14][30] - MeshFS, a parallel file system developed by XSKY, addresses the IO wall by enhancing read and write bandwidth significantly [18][22] - MeshSpace provides a global non-structured data platform that allows seamless data flow and management across different storage types, enhancing operational efficiency [25][29] Group 4: Future Outlook - XSKY emphasizes the importance of maintaining a stable data foundation to support rapid advancements in computing power, adhering to the "data evergreen" philosophy [36][41] - The company aims to be a guardian of enterprise data assets while accelerating the AI journey for businesses, ensuring that proprietary data is effectively transformed into competitive advantages [38][41]
从平面几何出发:形式化验证如何驱动MLLM的推理能力跃迁
机器之心· 2026-01-20 10:19
在迈向通用人工智能(AGI)的征途中,多模态大语言模型(MLLMs)虽然在视觉理解与文本生成上展现了惊人的能力,却始终面临一道难以逾越的鸿沟:如何 在复杂的数学与几何推理中,克服固有的幻觉与逻辑断层? 现有的 "结果导向" 训练往往掩盖了推理过程的脆弱性,导致模型常常 "蒙对答案" 却 "想错过程"。这 种 "黑盒" 式的学习方式,使得模型难以习得真正鲁棒的推理能力。 面对这一挑战,来自 上海 交通大学 、 复旦大学、香港 中文大学(深圳)、上海人工智能实验室等研究机构的团队 提出了一套全新的系统化解决方案: "Formal Enhance Informal Reasoning"(以形式化增强非形式化推理)。 该方案的核心洞察在于:利用领域内(In-Domain)极度严谨、可验证的形式化逻辑,可以作为一 种强有力的监督信号,去规范和引导模型在非形式化场景下的推理行为。 更进一步,研究发现这种在严谨数学环境中习得的逻辑素养,不仅仅局限于几何题,更 能作为一把通用的钥匙,解锁模型在通用数学乃至更广泛推理任务上的分布外(OOD)泛化能力。 基于这一理念,团队历经三个阶段的探索,构建了从数据底层到模型顶层的完整闭环: ...
击败GPT、Gemini,复旦×创智孵化创业团队「模思智能」,语音模型上新了
机器之心· 2026-01-20 10:19
Core Viewpoint - The article highlights the breakthrough capabilities of the MOSS-Transcribe-Diarize model developed by MOSI AI, which excels in multi-speaker automatic speech recognition (ASR) and outperforms existing models like GPT-4o and Gemini in complex audio environments [1][2][9]. Group 1: Model Capabilities - MOSS-Transcribe-Diarize can handle overlapping speech and chaotic dialogue scenarios effectively, demonstrating a significant improvement in transcription accuracy [1][5]. - The model supports a long context window of 128K, allowing it to process audio inputs of up to 90 minutes, showcasing its robustness in complex environments [1][9]. - It achieves state-of-the-art (SOTA) performance across various benchmarks, including AISHELL-4, Podcast, and Movies datasets, particularly excelling in challenging audio conditions [2][16][19]. Group 2: Technical Innovations - The model employs a unified end-to-end multimodal architecture that integrates speech recognition, speaker attribution, and timestamp prediction, addressing the classic SATS (Speaker Attribution and Timestamped Speech) challenge [8][12]. - MOSS-Transcribe-Diarize utilizes a combination of real-world dialogue audio and synthetic data for training, enhancing its robustness against overlapping speech and acoustic variations [13][14]. - The architecture allows for direct output of text with speaker labels and precise timestamps, improving accuracy through semantic information utilization [12][14]. Group 3: Competitive Advantage - In benchmark tests, MOSS-Transcribe-Diarize significantly outperformed competitors like GPT-4o and Gemini 3 Pro in metrics such as Character Error Rate (CER) and optimal permutation Character Error Rate (cpCER), particularly in long audio inputs [16][19]. - The model maintains speaker consistency in long dialogues, reducing performance degradation caused by speaker attribution errors [16]. - It demonstrates superior performance in various scenarios, including real-world meetings, podcasts, and complex film dialogues, proving its versatility and effectiveness [19][21]. Group 4: Future Directions - MOSI AI aims to continue advancing multimodal intelligence, focusing on enabling AI to understand complex real-world contexts and achieve natural, coherent, and reliable interactions [24]. - The company has a strategic vision to develop technologies that enhance real-time dialogue interaction and robust speech understanding, positioning itself as a leader in the AI field [24].
EmbodiChain开源,用100%生成式数据自动训练具身智能模型
机器之心· 2026-01-20 07:16
Core Insights - The article discusses the limitations of traditional data collection methods in robotics and emphasizes the need for innovative approaches to generate high-quality interactive data that adheres to physical laws [2][3] - It introduces the concept of "Efficiency Law," which posits that the performance of models is directly related to the rate of data generation, highlighting the necessity for a shift from data scarcity to data abundance in embodied intelligence [5][8] - The launch of EmbodiChain is presented as a foundational step towards creating a generative simulation world model (GS-World), which aims to automate data generation and enhance the learning paradigm for embodied intelligence [13][19] Data Collection Paradigms - The scarcity and high cost of 3D calibrated data for robotics have made data collection paradigms a focal point in industry research [2] - The industry is moving towards more cost-effective and convenient data collection methods, transitioning from expensive remote operation devices to innovative solutions that require minimal human intervention [2] - The article highlights the importance of digitizing human skills to bridge the gap between human experience and robotic actions [2] Challenges in Embodied Intelligence - Current physical data collection methods cannot match the scale required for training large language models (LLMs), which presents a significant barrier to advancing embodied intelligence [3] - The article identifies the slow data generation rate as a bottleneck, where even large model parameters cannot compensate if the model is not adequately fed with data [8] Efficiency Law and Data Generation - The concept of "Efficiency Law" suggests that the relationship between model performance and data generation rate is crucial for the evolution of intelligence [17] - The article argues that in the era of embodied intelligence, data must be generated incrementally, requiring the ability to create data rather than merely cleaning existing datasets [7][14] EmbodiChain and GS-World - EmbodiChain is introduced as a data and model platform that aims to revolutionize the learning paradigm for embodied intelligence by enabling high-speed, automated data generation [13][15] - The article outlines three core scientific challenges that EmbodiChain seeks to address: automating data production, bridging the "Sim2Real Gap," and overcoming the "IO wall" in data generation [16] Comparison of Approaches - The article contrasts the GS-World approach, which focuses on generating physically accurate models, with the video generation route that has shown weaknesses in maintaining long-term temporal consistency [24][25] - It emphasizes the importance of a 3D, interactive, and physically rigorous world model for effective training of robots [30] Results and Future Vision - The results from training the Sim2Real-VLA model using only generated data demonstrate superior performance compared to traditional methods, showcasing the potential of the proposed approach [28][38] - The vision for GS-World extends beyond current capabilities, aiming to create a self-sustaining infrastructure for embodied intelligence research that alleviates the constraints of data scarcity [34][35]