量子位
Search documents
谷歌最强大模型付费上线,在DeepSeek开源后被吐槽太贵
量子位· 2025-12-05 05:33
Jay 发自 凹非寺 量子位 | 公众号 QbitAI 奥特曼又得拉响红色警报了。 刚刚,谷歌再次扔出重磅炸弹—— Gemini 3 Deep Think 正式上线! 这款谷歌最新最强模型,推理能力确实有点离谱。 轻松把草图变成逼真3D场景,不仅结构还原到位,就连镂空花纹与光影都处理得明明白白。 甚至有网友拿它搞起了视觉艺术,一人一AI在虚拟宇宙里「不知天地为何物」。 看完这些demo,估计奥特曼只得再次咬牙切齿送上「happy for u」了。 (doge) 几句话就能搭出个3D多米诺骨牌解压游戏,运行相当丝滑。 Ultra用户今天就能通过Gemini聊天框里的「Deep Think」选项用起来了~ 高歌猛进的Gemini,又一次屠榜 不给对手任何喘息的机会,Gemini 3 Pro刚给OpenAI按在地上锤完,谷歌转手又扔出一重磅炸弹——Gemini 3 Deep Think。 相比之前的模型,新版本在复杂数学、科学推理和逻辑问题上都有大幅提升,旨在攻克那些连最强模型都难以解决的数学、科学和逻辑问题。 具体来说,在「深度思考」模式下,Gemini会开启迭代推理,能多轮打磨代码,生成更精细的程序,从而在可视 ...
Ilya刚预言完,世界首个原生多模态架构NEO就来了:视觉和语言彻底被焊死
量子位· 2025-12-05 05:33
Core Insights - The AI industry is experiencing a paradigm shift, moving away from merely scaling models to focusing on smarter architectures, as highlighted by Ilya Sutskever's statement that the era of scaling laws is over [1][2][20]. - A new native multimodal architecture called NEO has emerged from a Chinese research team, which is the first scalable open-source model that integrates visual and language understanding at a fundamental level [4][19]. Group 1: Current State of Multimodal Models - The mainstream approach to multimodal models has relied on modular architectures that simply concatenate pre-trained visual and language components, leading to inefficiencies and limitations in understanding [6][8]. - Existing modular models face three significant technical gaps: efficiency, capability, and fusion, which hinder their performance in complex tasks requiring deep semantic understanding [14][15][17]. Group 2: NEO's Innovations - NEO introduces a unified model that inherently integrates visual and language processing, eliminating the distinction between visual and language modules [19]. - The architecture features three core innovations: Native Patch Embedding for high-fidelity visual representation, Native-RoPE for adaptive spatial encoding, and Native Multi-Head Attention for enhanced interaction between visual and language tokens [22][24][29][33]. Group 3: Performance and Efficiency - NEO demonstrates remarkable data efficiency, achieving competitive performance with only 3.9 million image-text pairs for training, which is one-tenth of what other leading models require [39]. - In various benchmark tests, NEO has outperformed other models, showcasing superior performance in tasks related to visual understanding and multimodal capabilities [41][42]. Group 4: Implications for the Industry - NEO's architecture not only enhances performance but also lowers the barriers for deploying multimodal AI in edge devices, making advanced visual perception capabilities accessible beyond cloud-based systems [43][45][50]. - The open-sourcing of NEO models signals a shift in the AI community towards more efficient and unified architectures, potentially setting a new standard for multimodal technology [48][49]. Group 5: Future Directions - NEO's design philosophy aims to bridge the semantic gap between visual and language processing, paving the way for future advancements in AI, including video understanding and 3D spatial perception [46][51]. - The emergence of NEO represents a significant contribution from a Chinese team to the global AI landscape, emphasizing the importance of architectural innovation over mere scaling [53][54].
华为新架构砍了Transformer大动脉!任意模型推理能力原地飙升
量子位· 2025-12-05 02:13
Core Viewpoint - The article discusses the limitations of the traditional Transformer architecture, particularly its Attention mechanism, and introduces a new architecture called Nexus, which employs a Higher-Order Attention Mechanism to enhance reasoning capabilities in complex tasks [1][2][4][7]. Group 1: Limitations of Traditional Transformer - The traditional Attention mechanism struggles with complex mathematical problems and multi-step logical reasoning, leading to inaccurate outputs [2][6]. - The core issue lies in the static nature of Query (Q) and Key (K) generation, which limits the model's ability to capture complex relationships [15][14]. Group 2: Introduction of Nexus - Huawei's Noah's Ark Lab has developed Nexus, which addresses the limitations of the traditional Attention mechanism by using higher-order attention to model complex relationships effectively [7][8]. - Experimental results indicate that models using Nexus show significant improvements in reasoning tasks without increasing parameters [10][35]. Group 3: Innovations in Nexus Architecture - Nexus innovates by making the generation of Q and K an attention operation itself, allowing tokens to aggregate contextual information before calculating Q and K [17][18]. - The architecture employs a recursive framework that supports multi-hop reasoning, enabling the construction of higher-order relationships [23][27]. - Nexus maintains parameter efficiency through weight-sharing strategies, ensuring that the model's complexity does not lead to an increase in parameter count [29][31]. Group 4: Performance Improvements - In experiments with the Pythia series models, Nexus consistently outperformed the original Transformer across various reasoning datasets, with notable improvements in tasks requiring multi-step reasoning [36][39]. - For instance, the accuracy of the 70M model on the SciQ dataset improved from 61.5% to 68.5%, a 7 percentage point increase [39]. Group 5: Application and Future Directions - Nexus demonstrates plug-and-play capabilities, allowing for easy integration into larger models without extensive retraining, thus enhancing reasoning abilities [41][44]. - The team plans to explore Nexus's applications in visual Transformers and multimodal models, indicating its potential beyond language tasks [45][46].
下周三!量子位的这件大事就要来了|MEET2026
量子位· 2025-12-05 02:13
MEET组委会 发自 凹非寺 量子位 | 公众号 QbitAI 抓紧,真的只剩 一周 时间了! 因为AI圈一年一度绝对不容错过的盛宴马上就要来了—— MEET2026智能未来大会 。 而且现在大会的内容已经可以剧透,真就是光看嘉宾就知道有多重磅了,包括清华大学张亚勤、孙茂松、智源研究院王仲远等学术界的大咖, 国内产业界有百度、小米、商汤等,国外有谷歌云、亚马逊云科技、高通等。 不仅如此,议题内容也是相当之丰富,从大语言模型到多模态,从具身智能到自动驾驶,从云计算到具体应用,可以说是涵盖了与当下主流AI 相关的方方面面。 还有还有,如果你也希望获得最前瞻的观点,那MEET也是绝对不容错过的大会,包你有收获和启发~ 是不是看到这就已经开始心动了?心动不如行动, 线下报名通道 这就奉上: 那么MEET2026智能未来大会还有什么亮点?我们继续往下看。 亮点一:重磅GenAI对话+前沿Agent圆桌,深挖年度最热议题 今年大家还在问AI会不会取代人类吗?可能已经没那么焦虑了—— 因为AI开始学会自己动手了。 Robotaxi不再只是PPT里的概念,而是真的在街头载客;Agent也不再只是写写代码、回回邮件,而是能自主 ...
量子位编辑作者招聘
量子位· 2025-12-05 02:13
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 主编 :具备选题和带队能力及经验; 主笔 :具备原创深度稿件能力; 编辑 :热爱表达,喜欢挖掘信息,能够用大白话让所有人看懂AI新进展。 跟进AI基建层新进展,包括但不限于芯片、AI Infra、云计算领域新进展,核心玩家动态; 做前沿论文、开源社区、技术大会 (Hot Chips、NeurIPS、MLSys) 技术报告大众化解读; 参与核心采访,对话产业专家、技术大牛、撰写AI云落地案例。 任职要求: AI财经商业方向 岗位职责: AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、 ...
市值3055亿!摩尔线程敲钟,国产通用GPU第一股来了
量子位· 2025-12-05 02:13
Core Viewpoint - The successful IPO of Moer Thread marks the first domestic general-purpose GPU company to be listed, with an opening price of approximately 650 yuan, representing a 469% increase from the issuance price of 114.28 yuan, and a market capitalization exceeding 305.5 billion yuan [1][6]. Group 1: IPO Details - Moer Thread's IPO application was accepted on June 30 and it successfully passed the review on September 26, achieving the fastest IPO approval record on the Sci-Tech Innovation Board in just 88 days [2]. - The company raised a total of 8 billion yuan through its initial public offering, setting a new record for the highest fundraising amount for new stocks in A-shares this year [5]. - The total share capital after the IPO is 470 million shares, with 70 million shares issued [4]. Group 2: Financial Performance - In the first three quarters of this year, Moer Thread reported a revenue of 780 million yuan, a year-on-year increase of 182% [10]. - The net loss for the same period narrowed to 720 million yuan, down from 890 million yuan in the previous year [11]. - The revenue structure has shifted significantly, with AI computing products becoming the main revenue driver, contributing 94.85% of total revenue in the first half of this year [14]. Group 3: Investment and Development - The company has attracted significant investment from notable institutions such as China Mobile, Sequoia Capital, and various state-owned enterprises, indicating strong market confidence [2]. - The funds raised will primarily be allocated to research and development, with 2.5 billion yuan earmarked for the development of a new generation of AI training and inference integrated chips, and similar amounts for graphics chips and AI SoC chips [8][10]. Group 4: Company Background - Moer Thread was founded in June 2020, with a registered capital of 330 million yuan, and is controlled by Zhang Jianzhong, who holds 44.07% of the shares [16]. - Zhang Jianzhong previously served as the general manager of NVIDIA China and has over 15 years of experience in the GPU industry, establishing a complete ecosystem for GPUs in China [17][18]. - The company has developed a unified system architecture called MUSA, integrating various capabilities such as AI computing acceleration and graphics rendering into a single chip [21][23].
2025年的冬天,上海凭什么被称为“世界具身智能第一战场”?
量子位· 2025-12-05 02:13
Core Viewpoint - The article emphasizes the rapid advancement of China's embodied intelligence industry, particularly in Shanghai, as it prepares for the GDPS 2025 Global Developer Pioneer Conference, which is seen as a pivotal moment for the industry to transition from digital to physical applications [1][3]. Group 1: Shanghai's Role in Embodied Intelligence - Shanghai is positioned as a model for a "service-oriented government," providing not just funding but also pathways and resources for developers [3]. - The city has opened up over a hundred core scenarios in high-end manufacturing, healthcare, and urban governance for companies to test and implement embodied intelligence solutions [4][6]. Group 2: Supportive Policies and Infrastructure - Shanghai has introduced a "computing power voucher" policy, offering up to 40 million yuan per year to support companies in accessing high-level computing resources [7][8]. - The government is also providing up to 5 million yuan annually to support the construction of a common knowledge base for physical world interactions, addressing the industry's data isolation issues [9][10]. Group 3: Industry Evolution and Technological Breakthroughs - The physical proximity of various components in the Zhangjiang Robot Valley has significantly reduced the hardware iteration cycle from months to weeks or even days, fostering an ecosystem for embodied intelligence [11][13]. - The article highlights the emergence of competitive "unicorns" in the industry, marking a shift from conceptual visions to practical engineering solutions [15]. Group 4: Notable Achievements in Robotics - Zhiyuan Robotics set a Guinness World Record for endurance with its A2 robot, demonstrating advanced energy management and SLAM algorithms by walking 106.286 kilometers autonomously [16][17]. - Fourier Intelligence's GR-2 robot has become a benchmark in safety and sensitivity in rehabilitation, showcasing its ability to understand human pain and touch [18][19]. Group 5: Future Prospects and Challenges - The GDPS event on December 12 is framed as a comprehensive demonstration of the capabilities of embodied intelligence teams, focusing on real-world applications rather than theoretical presentations [35][36]. - The article concludes that the establishment of a complete ecosystem for embodied intelligence in Shanghai represents a significant milestone for the industry, enabling the development of intelligent systems that can learn and adapt in real-world environments [43][44].
黄仁勋做客美国第一播客:每天都在担心英伟达倒闭
量子位· 2025-12-04 09:55
Core Insights - The conversation highlights a fundamental shift in AI from "retrieval" to "reasoning," where AI generates answers based on learned knowledge structures rather than simply retrieving pre-stored data [6][7][9] - Huang emphasized that AI's core mechanism has transformed into a process of learning and immediate logical reasoning, likening data centers to new factories producing intelligent tokens [9][13] - The discussion also touched on the challenges of energy consumption in AI expansion, with Huang noting that efficiency improvements in chips are crucial to meet growing demands without exhausting global energy resources [14][16] Group 1: AI Evolution - The transition from "retrieval" to "reasoning" represents a significant change in how AI operates, moving from searching for answers to generating them based on learned knowledge [6][7] - Huang described deep learning as a process where a massive neural network learns from vast amounts of input and output examples, functioning as a universal function approximator [11][12] - The concept of data centers as "AI factories" was introduced, where energy and data are inputs, and intelligent tokens are outputs, marking a new era in manufacturing [13] Group 2: Impact on Workforce - Huang addressed concerns about AI replacing jobs, suggesting that while tasks may change, jobs will not disappear; instead, people will become more focused on problem-solving and decision-making [16][17] - The future of programming will involve using natural language, significantly lowering the technical barrier and allowing everyone to become a programmer [18][19] - Huang acknowledged the potential for a future internet filled with AI-generated content, but he believes that as long as the information is verified, it can enhance knowledge acquisition [19] Group 3: Technological Advancements - The traditional Moore's Law is slowing down, but in the realm of AI, accelerated computing is allowing for a rebirth of the law in a new form [20][21] - Huang explained the difference between CPUs and GPUs, noting that GPUs are better suited for AI due to their ability to handle massive parallel computations [22][24] - The cost of AI computing has decreased by a factor of 100,000 over the past decade, akin to a revitalized Moore's Law [24] Group 4: Company History and Challenges - Huang recounted a critical moment in NVIDIA's history when the company was just 30 days away from bankruptcy, highlighting the importance of honesty and transparency in business [33][34] - The early struggles included a significant technical error that nearly derailed the company, but a candid conversation with Sega's CEO led to a lifeline that saved NVIDIA [34][36] - Huang's commitment to innovation, even in the face of skepticism, has been a driving force behind NVIDIA's success [30][32]
“豆包手机”在二手市场价格都翻倍了……
量子位· 2025-12-04 09:55
嘻疯 发自 凹非寺 量子位 | 公众号 QbitAI "豆包手机"刚发售, 火到 3万 台首批备货被一抢而空 。 甚至还有不少人在海鲜市场上溢价转卖,加价1500、直接翻倍的都有: 关键这还是在官方明确表示 各种功能体验还不够完善的情况下 …… 说的就是 搭载 豆包 手机助手技术预览版的工程样机nubia M153 。 不久前,字节豆包团队刚传出了要和中兴合作打造AI手机的消息,随后脚第一代产品就上架了。 nubia M153以 售价3499元 ,仅面向想要体验豆包手机助手的行业人士少量发售 。 官方明确表示,在软件方面," 尚无法保证成熟手机产品的功能完善度 " , 比如影像等功能可能和市场主流旗舰机存在差距。 nubia M153的软件,大概每两周会进行一次更新。 目前,各方的实测体验也相当丰富,玩法涵盖跨App指令操作等多种场景。不少开发者和行业人士评价称"豆包勾勒出了AI时代手机的雏形"。 也有用户反馈,从实际操作时长来看,豆包助手目前的执行速度仍略慢于人工操作,但优势在于能显著减少用户的筛选和决策成本。 除此之外,据供应链消息人士向蓝鲸新闻透露,这次"豆包手机"确实是试水市场,"售罄后并未追加物料采 ...
大模型被确诊「视觉文盲」!多校联合提出MILO,为它植入空间想象力
量子位· 2025-12-04 09:55
Core Insights - The article discusses the limitations of multi-modal large language models (MLLMs) in spatial reasoning, highlighting their inability to effectively understand and visualize spatial concepts, leading to a phenomenon termed "visual illiteracy" [2][3]. Group 1: Challenges in Spatial Reasoning - Spatial reasoning is identified as a core cognitive ability for humans to understand three-dimensional structures, which poses a significant challenge for MLLMs in practical applications [2]. - Current methods primarily rely on "language description tuning," which fails to provide models with a true visual understanding of spatial concepts [2][3]. Group 2: Introduction of MILO - A research team has proposed MILO (Implicit Spatial World Modeling) to address the spatial reasoning challenges faced by MLLMs by integrating visual generative feedback with symbolic reasoning [4]. - MILO employs a two-phase training process: the first phase involves visual generative tuning where the model learns spatial transformations through visual outputs, and the second phase involves language tuning using spatial instruction data [5]. Group 3: Enhancements in Geometric Perception - To further enhance geometric perception, the team introduced RePE (Relative Positional Encoding), which captures relative transformations between adjacent frames instead of relying on a global coordinate system, improving generalization and adaptability across datasets [8][9]. Group 4: GeoGen Dataset - The research team constructed the GeoGen dataset, comprising approximately 2,241 videos and 267,000 "observation-action-result" triplets, aimed at enhancing geometric perception generation [10]. - The dataset includes diverse sources such as scanned 3D scenes and internet videos, ensuring a wide range of realistic scenarios [11]. Group 5: Validation of MILO - The effectiveness of MILO was validated across multiple baseline models and five categories of spatial understanding tasks, achieving optimal performance in 3D scene understanding tasks and spatial reasoning tasks [12][16]. - Notably, MILO improved accuracy by 3.2% in the ScanRefer task and achieved an average accuracy of 61.7% in the VSI-Bench spatial reasoning task, surpassing the baseline VG-LLM by 2.2% [16].