量子位
Search documents
谷歌最强大模型付费上线,在DeepSeek开源后被吐槽太贵
量子位· 2025-12-05 05:33
Core Viewpoint - Google has launched its latest model, Gemini 3 Deep Think, which significantly enhances reasoning capabilities, particularly in complex mathematics, science, and logic problems [2][9]. Group 1: Model Features and Performance - Gemini 3 Deep Think allows for iterative reasoning, enabling multi-round refinement of code to produce more detailed results in visualization, prototyping, and experimentation [9]. - The model achieved an accuracy of 41.0% on the Humanity's Last Exam benchmark, outperforming GPT-5 Pro by 10 percentage points [10]. - In the ARC-AGI-2 benchmark, the code execution accuracy reached 45.1%, surpassing Gemini 3 Pro by 14% and GPT-5.1 by nearly 30% [11]. - The underlying technology of this model is derived from Gemini 2.5 Deep Think, which has previously won gold medals in prestigious competitions like IMO and ICPC [14]. Group 2: User Reception and Pricing Concerns - Gemini 3 Deep Think is currently available only to Ultra members at a monthly fee of $249.9, approximately 1800 RMB, leading to dissatisfaction among Pro users who feel excluded [18]. - Users expressed frustration over the lack of trial access or pay-per-use options, questioning the value of the Ultra membership [21][22]. - The pricing strategy has resulted in a lukewarm reception for Gemini 3 Deep Think, with many users criticizing the high cost and lack of transparency regarding its value [24][27]. Group 3: Competitive Landscape - The recently updated DeepSeek-V3.2 has shown competitive reasoning capabilities, closely matching Gemini 3 Pro and winning accolades in major competitions, which poses a challenge to Gemini 3 Deep Think [25]. - DeepSeek is open-source, contrasting with Google's closed-source approach and high pricing, which has contributed to user dissatisfaction [26][27].
Ilya刚预言完,世界首个原生多模态架构NEO就来了:视觉和语言彻底被焊死
量子位· 2025-12-05 05:33
Core Insights - The AI industry is experiencing a paradigm shift, moving away from merely scaling models to focusing on smarter architectures, as highlighted by Ilya Sutskever's statement that the era of scaling laws is over [1][2][20]. - A new native multimodal architecture called NEO has emerged from a Chinese research team, which is the first scalable open-source model that integrates visual and language understanding at a fundamental level [4][19]. Group 1: Current State of Multimodal Models - The mainstream approach to multimodal models has relied on modular architectures that simply concatenate pre-trained visual and language components, leading to inefficiencies and limitations in understanding [6][8]. - Existing modular models face three significant technical gaps: efficiency, capability, and fusion, which hinder their performance in complex tasks requiring deep semantic understanding [14][15][17]. Group 2: NEO's Innovations - NEO introduces a unified model that inherently integrates visual and language processing, eliminating the distinction between visual and language modules [19]. - The architecture features three core innovations: Native Patch Embedding for high-fidelity visual representation, Native-RoPE for adaptive spatial encoding, and Native Multi-Head Attention for enhanced interaction between visual and language tokens [22][24][29][33]. Group 3: Performance and Efficiency - NEO demonstrates remarkable data efficiency, achieving competitive performance with only 3.9 million image-text pairs for training, which is one-tenth of what other leading models require [39]. - In various benchmark tests, NEO has outperformed other models, showcasing superior performance in tasks related to visual understanding and multimodal capabilities [41][42]. Group 4: Implications for the Industry - NEO's architecture not only enhances performance but also lowers the barriers for deploying multimodal AI in edge devices, making advanced visual perception capabilities accessible beyond cloud-based systems [43][45][50]. - The open-sourcing of NEO models signals a shift in the AI community towards more efficient and unified architectures, potentially setting a new standard for multimodal technology [48][49]. Group 5: Future Directions - NEO's design philosophy aims to bridge the semantic gap between visual and language processing, paving the way for future advancements in AI, including video understanding and 3D spatial perception [46][51]. - The emergence of NEO represents a significant contribution from a Chinese team to the global AI landscape, emphasizing the importance of architectural innovation over mere scaling [53][54].
华为新架构砍了Transformer大动脉!任意模型推理能力原地飙升
量子位· 2025-12-05 02:13
Core Viewpoint - The article discusses the limitations of the traditional Transformer architecture, particularly its Attention mechanism, and introduces a new architecture called Nexus, which employs a Higher-Order Attention Mechanism to enhance reasoning capabilities in complex tasks [1][2][4][7]. Group 1: Limitations of Traditional Transformer - The traditional Attention mechanism struggles with complex mathematical problems and multi-step logical reasoning, leading to inaccurate outputs [2][6]. - The core issue lies in the static nature of Query (Q) and Key (K) generation, which limits the model's ability to capture complex relationships [15][14]. Group 2: Introduction of Nexus - Huawei's Noah's Ark Lab has developed Nexus, which addresses the limitations of the traditional Attention mechanism by using higher-order attention to model complex relationships effectively [7][8]. - Experimental results indicate that models using Nexus show significant improvements in reasoning tasks without increasing parameters [10][35]. Group 3: Innovations in Nexus Architecture - Nexus innovates by making the generation of Q and K an attention operation itself, allowing tokens to aggregate contextual information before calculating Q and K [17][18]. - The architecture employs a recursive framework that supports multi-hop reasoning, enabling the construction of higher-order relationships [23][27]. - Nexus maintains parameter efficiency through weight-sharing strategies, ensuring that the model's complexity does not lead to an increase in parameter count [29][31]. Group 4: Performance Improvements - In experiments with the Pythia series models, Nexus consistently outperformed the original Transformer across various reasoning datasets, with notable improvements in tasks requiring multi-step reasoning [36][39]. - For instance, the accuracy of the 70M model on the SciQ dataset improved from 61.5% to 68.5%, a 7 percentage point increase [39]. Group 5: Application and Future Directions - Nexus demonstrates plug-and-play capabilities, allowing for easy integration into larger models without extensive retraining, thus enhancing reasoning abilities [41][44]. - The team plans to explore Nexus's applications in visual Transformers and multimodal models, indicating its potential beyond language tasks [45][46].
下周三!量子位的这件大事就要来了|MEET2026
量子位· 2025-12-05 02:13
MEET组委会 发自 凹非寺 量子位 | 公众号 QbitAI 抓紧,真的只剩 一周 时间了! 因为AI圈一年一度绝对不容错过的盛宴马上就要来了—— MEET2026智能未来大会 。 而且现在大会的内容已经可以剧透,真就是光看嘉宾就知道有多重磅了,包括清华大学张亚勤、孙茂松、智源研究院王仲远等学术界的大咖, 国内产业界有百度、小米、商汤等,国外有谷歌云、亚马逊云科技、高通等。 不仅如此,议题内容也是相当之丰富,从大语言模型到多模态,从具身智能到自动驾驶,从云计算到具体应用,可以说是涵盖了与当下主流AI 相关的方方面面。 还有还有,如果你也希望获得最前瞻的观点,那MEET也是绝对不容错过的大会,包你有收获和启发~ 是不是看到这就已经开始心动了?心动不如行动, 线下报名通道 这就奉上: 那么MEET2026智能未来大会还有什么亮点?我们继续往下看。 亮点一:重磅GenAI对话+前沿Agent圆桌,深挖年度最热议题 今年大家还在问AI会不会取代人类吗?可能已经没那么焦虑了—— 因为AI开始学会自己动手了。 Robotaxi不再只是PPT里的概念,而是真的在街头载客;Agent也不再只是写写代码、回回邮件,而是能自主 ...
量子位编辑作者招聘
量子位· 2025-12-05 02:13
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are open for various levels, including editors, lead writers, and chief editors, with a focus on matching roles to individual capabilities [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Responsibilities include tracking innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as interpreting technical reports from conferences [6][7]. - **AI Finance Direction**: Focuses on venture capital, financial reports, and capital movements within the AI industry, requiring strong analytical skills and a passion for interviews [11]. - **AI Product Direction**: Involves evaluating AI applications and hardware, tracking product launches, and engaging with entrepreneurs and product experts in the AI space [11]. Group 3: Benefits and Work Environment - Employees will have the opportunity to engage with cutting-edge AI technologies, enhance their work efficiency through new tools, and build personal influence in the AI field [6]. - The company offers competitive salaries, comprehensive benefits including social insurance, meal allowances, and performance bonuses, along with a dynamic and open team culture [6]. Group 4: Company Growth and Reach - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sectors according to third-party data platforms [12].
市值3055亿!摩尔线程敲钟,国产通用GPU第一股来了
量子位· 2025-12-05 02:13
Core Viewpoint - The successful IPO of Moer Thread marks the first domestic general-purpose GPU company to be listed, with an opening price of approximately 650 yuan, representing a 469% increase from the issuance price of 114.28 yuan, and a market capitalization exceeding 305.5 billion yuan [1][6]. Group 1: IPO Details - Moer Thread's IPO application was accepted on June 30 and it successfully passed the review on September 26, achieving the fastest IPO approval record on the Sci-Tech Innovation Board in just 88 days [2]. - The company raised a total of 8 billion yuan through its initial public offering, setting a new record for the highest fundraising amount for new stocks in A-shares this year [5]. - The total share capital after the IPO is 470 million shares, with 70 million shares issued [4]. Group 2: Financial Performance - In the first three quarters of this year, Moer Thread reported a revenue of 780 million yuan, a year-on-year increase of 182% [10]. - The net loss for the same period narrowed to 720 million yuan, down from 890 million yuan in the previous year [11]. - The revenue structure has shifted significantly, with AI computing products becoming the main revenue driver, contributing 94.85% of total revenue in the first half of this year [14]. Group 3: Investment and Development - The company has attracted significant investment from notable institutions such as China Mobile, Sequoia Capital, and various state-owned enterprises, indicating strong market confidence [2]. - The funds raised will primarily be allocated to research and development, with 2.5 billion yuan earmarked for the development of a new generation of AI training and inference integrated chips, and similar amounts for graphics chips and AI SoC chips [8][10]. Group 4: Company Background - Moer Thread was founded in June 2020, with a registered capital of 330 million yuan, and is controlled by Zhang Jianzhong, who holds 44.07% of the shares [16]. - Zhang Jianzhong previously served as the general manager of NVIDIA China and has over 15 years of experience in the GPU industry, establishing a complete ecosystem for GPUs in China [17][18]. - The company has developed a unified system architecture called MUSA, integrating various capabilities such as AI computing acceleration and graphics rendering into a single chip [21][23].
2025年的冬天,上海凭什么被称为“世界具身智能第一战场”?
量子位· 2025-12-05 02:13
Core Viewpoint - The article emphasizes the rapid advancement of China's embodied intelligence industry, particularly in Shanghai, as it prepares for the GDPS 2025 Global Developer Pioneer Conference, which is seen as a pivotal moment for the industry to transition from digital to physical applications [1][3]. Group 1: Shanghai's Role in Embodied Intelligence - Shanghai is positioned as a model for a "service-oriented government," providing not just funding but also pathways and resources for developers [3]. - The city has opened up over a hundred core scenarios in high-end manufacturing, healthcare, and urban governance for companies to test and implement embodied intelligence solutions [4][6]. Group 2: Supportive Policies and Infrastructure - Shanghai has introduced a "computing power voucher" policy, offering up to 40 million yuan per year to support companies in accessing high-level computing resources [7][8]. - The government is also providing up to 5 million yuan annually to support the construction of a common knowledge base for physical world interactions, addressing the industry's data isolation issues [9][10]. Group 3: Industry Evolution and Technological Breakthroughs - The physical proximity of various components in the Zhangjiang Robot Valley has significantly reduced the hardware iteration cycle from months to weeks or even days, fostering an ecosystem for embodied intelligence [11][13]. - The article highlights the emergence of competitive "unicorns" in the industry, marking a shift from conceptual visions to practical engineering solutions [15]. Group 4: Notable Achievements in Robotics - Zhiyuan Robotics set a Guinness World Record for endurance with its A2 robot, demonstrating advanced energy management and SLAM algorithms by walking 106.286 kilometers autonomously [16][17]. - Fourier Intelligence's GR-2 robot has become a benchmark in safety and sensitivity in rehabilitation, showcasing its ability to understand human pain and touch [18][19]. Group 5: Future Prospects and Challenges - The GDPS event on December 12 is framed as a comprehensive demonstration of the capabilities of embodied intelligence teams, focusing on real-world applications rather than theoretical presentations [35][36]. - The article concludes that the establishment of a complete ecosystem for embodied intelligence in Shanghai represents a significant milestone for the industry, enabling the development of intelligent systems that can learn and adapt in real-world environments [43][44].
黄仁勋做客美国第一播客:每天都在担心英伟达倒闭
量子位· 2025-12-04 09:55
Core Insights - The conversation highlights a fundamental shift in AI from "retrieval" to "reasoning," where AI generates answers based on learned knowledge structures rather than simply retrieving pre-stored data [6][7][9] - Huang emphasized that AI's core mechanism has transformed into a process of learning and immediate logical reasoning, likening data centers to new factories producing intelligent tokens [9][13] - The discussion also touched on the challenges of energy consumption in AI expansion, with Huang noting that efficiency improvements in chips are crucial to meet growing demands without exhausting global energy resources [14][16] Group 1: AI Evolution - The transition from "retrieval" to "reasoning" represents a significant change in how AI operates, moving from searching for answers to generating them based on learned knowledge [6][7] - Huang described deep learning as a process where a massive neural network learns from vast amounts of input and output examples, functioning as a universal function approximator [11][12] - The concept of data centers as "AI factories" was introduced, where energy and data are inputs, and intelligent tokens are outputs, marking a new era in manufacturing [13] Group 2: Impact on Workforce - Huang addressed concerns about AI replacing jobs, suggesting that while tasks may change, jobs will not disappear; instead, people will become more focused on problem-solving and decision-making [16][17] - The future of programming will involve using natural language, significantly lowering the technical barrier and allowing everyone to become a programmer [18][19] - Huang acknowledged the potential for a future internet filled with AI-generated content, but he believes that as long as the information is verified, it can enhance knowledge acquisition [19] Group 3: Technological Advancements - The traditional Moore's Law is slowing down, but in the realm of AI, accelerated computing is allowing for a rebirth of the law in a new form [20][21] - Huang explained the difference between CPUs and GPUs, noting that GPUs are better suited for AI due to their ability to handle massive parallel computations [22][24] - The cost of AI computing has decreased by a factor of 100,000 over the past decade, akin to a revitalized Moore's Law [24] Group 4: Company History and Challenges - Huang recounted a critical moment in NVIDIA's history when the company was just 30 days away from bankruptcy, highlighting the importance of honesty and transparency in business [33][34] - The early struggles included a significant technical error that nearly derailed the company, but a candid conversation with Sega's CEO led to a lifeline that saved NVIDIA [34][36] - Huang's commitment to innovation, even in the face of skepticism, has been a driving force behind NVIDIA's success [30][32]
“豆包手机”在二手市场价格都翻倍了……
量子位· 2025-12-04 09:55
Core Viewpoint - The "Doubao Phone" has quickly sold out its initial stock of 30,000 units, indicating strong market demand despite the official acknowledgment of incomplete software functionality [1][12][14]. Group 1: Product Launch and Market Response - The Doubao Phone, featuring the Doubao Assistant technology, was launched in collaboration with ZTE, with the first product priced at 3,499 yuan [11][12]. - The initial batch sold out rapidly, with reports of reselling at a premium, with prices increasing by 1,500 yuan or even doubling in some cases [2]. - The product is positioned as a limited edition, with no plans for additional material procurement after the initial stock [17]. Group 2: Features and Capabilities - The Doubao Assistant is designed as a system-level AI assistant, allowing users to interact through various methods, including voice commands and a dedicated AI button [20][21]. - Users can perform tasks such as obtaining information about images, conducting price comparisons across shopping apps, and managing daily tasks like setting alarms and summarizing calls [23][50]. - The assistant has memory capabilities, allowing it to retain and recall user preferences and past interactions, enhancing its utility over time [51][56]. Group 3: Technical Aspects and Privacy Concerns - The Doubao Assistant requires user authorization to access certain system-level permissions, which are necessary for its operation [64][66]. - The official response to privacy concerns emphasizes that no user screen content is stored on the cloud, and all operations are conducted locally without retaining data for model training [66][67]. - The collaboration between Doubao and ZTE is clarified, with Doubao leading the product definition while ZTE focuses on hardware engineering [69].
大模型被确诊「视觉文盲」!多校联合提出MILO,为它植入空间想象力
量子位· 2025-12-04 09:55
Core Insights - The article discusses the limitations of multi-modal large language models (MLLMs) in spatial reasoning, highlighting their inability to effectively understand and visualize spatial concepts, leading to a phenomenon termed "visual illiteracy" [2][3]. Group 1: Challenges in Spatial Reasoning - Spatial reasoning is identified as a core cognitive ability for humans to understand three-dimensional structures, which poses a significant challenge for MLLMs in practical applications [2]. - Current methods primarily rely on "language description tuning," which fails to provide models with a true visual understanding of spatial concepts [2][3]. Group 2: Introduction of MILO - A research team has proposed MILO (Implicit Spatial World Modeling) to address the spatial reasoning challenges faced by MLLMs by integrating visual generative feedback with symbolic reasoning [4]. - MILO employs a two-phase training process: the first phase involves visual generative tuning where the model learns spatial transformations through visual outputs, and the second phase involves language tuning using spatial instruction data [5]. Group 3: Enhancements in Geometric Perception - To further enhance geometric perception, the team introduced RePE (Relative Positional Encoding), which captures relative transformations between adjacent frames instead of relying on a global coordinate system, improving generalization and adaptability across datasets [8][9]. Group 4: GeoGen Dataset - The research team constructed the GeoGen dataset, comprising approximately 2,241 videos and 267,000 "observation-action-result" triplets, aimed at enhancing geometric perception generation [10]. - The dataset includes diverse sources such as scanned 3D scenes and internet videos, ensuring a wide range of realistic scenarios [11]. Group 5: Validation of MILO - The effectiveness of MILO was validated across multiple baseline models and five categories of spatial understanding tasks, achieving optimal performance in 3D scene understanding tasks and spatial reasoning tasks [12][16]. - Notably, MILO improved accuracy by 3.2% in the ScanRefer task and achieved an average accuracy of 61.7% in the VSI-Bench spatial reasoning task, surpassing the baseline VG-LLM by 2.2% [16].