量子位
Search documents
知名数学家辞职投身AI创业:老板是00后华人女生
量子位· 2025-12-06 01:30
Core Viewpoint - A prominent mathematician, Ken Ono, has left academia to join a Silicon Valley AI startup, Axiom, founded by his former student, Carina Letong Hong, who is a 24-year-old math prodigy [2][4][6]. Group 1: Ken Ono's Transition - Ken Ono, recognized as a leading scholar in number theory, has made a radical decision to leave his lifelong academic career to become a "founding mathematician" at Axiom [5][10]. - His role involves pushing the limits of AI models by designing complex mathematical problems that require deep understanding of mathematical principles [10][12]. - Initially skeptical about AI's capabilities, Ono's perspective shifted after attending a workshop where he realized AI models were advancing rapidly in areas he specialized in [14][21]. Group 2: Axiom's Ambitions - Axiom aims to develop AI that can solve real mathematical problems for quantitative and hedge fund companies, focusing on formal mathematical proofs [27][28]. - The company achieved a valuation of $300 million with no products or users, attracting significant investment from top venture capital firms [37][38]. - Axiom has recently made headlines by solving complex mathematical problems, including Erdős problems 124 and 481, showcasing its potential in the mathematical community [29][33]. Group 3: Carina Letong Hong's Background - Carina Letong Hong, the founder of Axiom, has an impressive academic background, having completed dual degrees in mathematics and physics at MIT in just three years and winning multiple prestigious awards [40][44][47]. - She was inspired by her experiences in competitive mathematics and has a strong commitment to tackling difficult mathematical challenges [43][51]. - Hong's leadership and vision have positioned Axiom as a promising player in the intersection of mathematics and AI, earning her recognition as one of Forbes' 30 Under 30 in AI [51][53].
谷歌新架构突破Transformer超长上下文瓶颈!Hinton灵魂拷问:后悔Open吗?
量子位· 2025-12-05 09:33
Core Insights - Google has recently made significant advancements in AI, particularly in addressing the limitations of the Transformer architecture regarding long context processing [5][7][32] - The introduction of new models, Titans and MIRAS, aims to combine the speed of RNNs with the performance of Transformers, allowing for the expansion of context windows up to 2 million tokens during inference [2][11][14] Group 1: New Architectures - Titans is a new architecture that incorporates a neural long-term memory module, which dynamically updates weights during inference, enhancing the model's ability to retain and process information [14][15] - MIRAS serves as the theoretical framework behind Titans, focusing on integrating new and old information efficiently without losing critical concepts [22][28] Group 2: Memory Mechanisms - The Titans architecture introduces the concept of "Memory as Context" (MAC), which allows the model to use long-term memory as additional context for the attention mechanism, improving its ability to summarize and understand large amounts of information [16][18] - The model's ability to selectively update long-term memory based on "surprise metrics" enables it to prioritize significant new inputs while maintaining efficiency [19][20][21] Group 3: Performance Comparison - Experimental results indicate that models based on Titans and MIRAS outperform state-of-the-art linear recurrent models and comparable Transformer baseline models, demonstrating superior performance even with fewer parameters [27][32] - The new architecture's capability to handle extremely long contexts positions it as a strong competitor against large models like GPT-4 [32] Group 4: Future of AI Models - The exploration beyond Transformers continues, but the Transformer architecture remains a foundational theory in the era of large models [33] - Google's decision to publicly share its Transformer research has had a profoundly positive impact on the AI community, as noted by industry leaders [34]
Office危!阿里千问这回把“办公全家桶”打包进了对话框
量子位· 2025-12-05 09:33
Core Insights - The article discusses the recent upgrade of Alibaba's Qianwen app, which enhances its capabilities in document generation, intelligent formatting, online editing, and multi-format conversion, all integrated into a single platform [1][4]. Group 1: PPT Creation Capabilities - The upgrade significantly improves PPT creation by integrating the entire process from research, outline generation, content creation, editing, to exporting within one app [6][17]. - Users can upload documents, use photo recognition, voice commands, and one-sentence instructions to create presentations, with the app automatically extracting key points and providing ready-made templates [6][16]. - The app allows for easy editing of generated content, enabling users to modify titles and text directly within the platform, which enhances usability compared to traditional methods [13][14]. Group 2: Document Editing Features - Qianwen has introduced a one-stop solution for document editing, allowing users to generate structured and well-formatted Word documents through simple commands [19][20]. - The app can analyze topics from various dimensions, such as core viewpoints and market trends, and compile them into editable Word documents [21][22]. - Users can perform various editing tasks, including modifying instructions, adjusting formatting, and rewriting content, all within the app [26][35]. Group 3: User Experience and Accessibility - The overall user experience has been streamlined, making it easier for individuals, such as students and professionals, to create presentations and reports directly from their mobile devices [17][18]. - The app's design reduces the need for specialized skills, making it more accessible to a broader audience [18]. - The integration of multiple format conversions (Word, PPT, PDF, Excel) within a single app enhances operational efficiency for users [35].
GPT-5从零提出量子物理新想法,物理学家写成论文已登Physics Letters B
量子位· 2025-12-05 08:04
Core Viewpoint - The article discusses a groundbreaking theoretical physics paper authored by Stephen Hsu, which is notable for being primarily conceived by AI, specifically GPT-5, marking a significant development in the collaboration between AI and human researchers [2][3]. Summary by Sections AI Contribution to Physics Research - Stephen Hsu's paper published in "Physics Letters B" explores whether quantum evolution is strictly linear, questioning the compatibility of nonlinear modifications with relativistic requirements [5][6]. - The paper concludes that most nonlinear modifications cannot coexist with relativity due to issues related to locality and foliation independence [6][9]. Methodology and Collaboration with AI - Hsu published a supplementary article detailing his collaboration with GPT-5, highlighting a pivotal moment when GPT-5 suggested using the Tomonaga-Schwinger framework to analyze the compatibility of nonlinear quantum mechanics with relativity [10]. - Hsu employed a "Generator-Verifier" method, where one AI model generates derivation steps while another verifies them, significantly reducing the likelihood of errors [12]. Challenges and Insights from AI Collaboration - Hsu candidly describes the challenges faced when collaborating with large language models (LLMs), noting that they can make simple computational errors and conceptually flawed leaps that may mislead researchers [13][14]. - He emphasizes that the errors made by AI are not always easy to detect, citing a specific instance where the model incorrectly suggested a method for proving conditions in nonlinear terms, which required substantial effort to identify and correct [16][17]. Future Prospects of Human-AI Collaboration - Hsu expresses optimism about the future of human-AI collaboration in formal sciences, predicting that mixed collaboration will become standard in fields like mathematics and physics as AI models improve in accuracy and contextual understanding [18].
优理奇机器人完成两轮合计3亿元天使++++轮及天使+++++轮融资,“算法-硬件-场景”三位一体加速具身智能应用落地
量子位· 2025-12-05 08:04
Core Viewpoint - The company, UniX AI, has successfully completed two rounds of financing totaling 300 million yuan, indicating strong market recognition of its unique value in the field of embodied intelligence, which integrates algorithms, hardware, and scenarios [1]. Group 1: Financing and Market Recognition - UniX AI has completed its fifth round of financing within six months, attracting investments from various institutions and existing shareholders, highlighting the company's appeal in the market [1]. - The company has established a strong capital alliance with top-tier technology and engineering teams, validated products, and a clear commercialization path, supported by government initiatives [13]. Group 2: Product Development and Market Application - The company emphasizes a "scene-driven" development path, continuously validating its products in real commercial environments, which enhances algorithm models and technology iterations [3]. - Since the start of mass production in 2025, UniX AI has achieved monthly deliveries of over 100 units, with more than 1,000 orders in hand, covering high-value scenarios such as hotels, property management, security, retail, and dining [5]. Group 3: Technological Advancements - UniX AI has developed a complete technology stack encompassing perception, decision-making, and control, significantly improving the adaptability and reliability of robots in unstructured environments [6]. - The company has established a rapid iteration feedback loop from training models to real-world applications, focusing on the integration of algorithms, real environments, and engineering [7]. Group 4: Educational and Research Initiatives - The company is actively building a research and education ecosystem by launching standardized robotic arm products aimed at universities and research institutions, enhancing its influence in the technological ecosystem [9]. - The product, UniOpenArmX, was unveiled at the IROS 2025 conference, designed to be teachable, programmable, and reproducible, providing efficient infrastructure for research and education [9]. Group 5: Future Directions - The industry of embodied intelligence is transitioning from demonstration to validation and scaling, with the CEO emphasizing the importance of unifying algorithm capabilities, hardware capabilities, and scenario capabilities [11]. - UniX AI aims to advance along three paths: productization, internationalization, and ecosystem development, striving to integrate embodied intelligence into social infrastructure [12].
视频模型也能推理,Sora2推理能力超过GPT-5
量子位· 2025-12-05 08:04
DeepWisdom团队 投稿 量子位 | 公众号 QbitAI 视频模型能不能通过生成视频来解决推理问题?—— 答案是 能 。尤其在空间类任务(比如走迷宫)上,比图文模型更擅长,更稳。 DeepWisdom研究团队提出: 视频生成模型不仅能画画,更能推理 。 它们通过生成连续的视频帧来进行时空规划,这种能力在处理复杂空间任务时,甚至超越了GPT-5和Gemini 2.5 Pro等顶尖的多模态大模 型。 | Method | | | EM (1) | | | | | SR (1) | | | | | PR (↑) | | | | | SD (1) | | | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | | Base | Irreg | Trap | 3D | Soko | Base | Irreg | Trap | 3D | Soko | Base | Irreg | Trap | 3D | So ...
北航领衔发布300页代码智能综述:从基础模型到智能体,一次读懂Code LLM全景图
量子位· 2025-12-05 05:33
Core Insights - The article discusses a comprehensive review of the code intelligence field, detailing the evolution of programming paradigms and the development of foundational models, tasks, training methodologies, and applications in the industry [1][3]. Group 1: Evolution of Programming Paradigms - The paper outlines a clear evolutionary path in programming from manual coding to AI-assisted collaborative development, indicating a shift where developers increasingly express intentions in natural language for models to implement [4][6]. - This paradigm shift is more profound than any previous tool upgrade, marking a critical transition in programming methods [7][8]. Group 2: Code Foundation Models - The paper constructs an overall blueprint for code foundation models, comparing training processes of general LLMs and code-specific models, and identifying core datasets such as GitHub code, issue discussions, and API documentation that form the engineering world knowledge [10][12]. - The evolution of model structures, from CodeBERT and CodeT5 to current architectures, reflects ongoing adaptation to code task requirements [11]. Group 3: Code Tasks and Benchmarks - The evaluation system for code models has been fragmented; the paper organizes tasks by granularity, from function-level to engineering-level tasks, with corresponding benchmarks [14][18]. - While HumanEval and MBPP serve as basic indicators, they only reflect the models' foundational capabilities, with more complex tasks needed to assess real project understanding [15][16]. Group 4: Model Alignment and Enhancement - The paper summarizes methods for model alignment and capability enhancement, focusing on making models better understand engineering rather than just generating code-like text [19][20]. - Key aspects include repo-level training to ensure models comprehend module dependencies and project organization, which is crucial for stable performance in real scenarios [22]. Group 5: Software Engineering Agents - The potential of code intelligence expands when models participate as agents in the software engineering process, moving beyond mere code generation to continuous decision-making and real-time feedback utilization [27][28]. - The current bottleneck for these agents is not model capability but effectively leveraging environmental signals such as test results and tool feedback [28]. Group 6: Security and Governance - The paper discusses the complexities of security issues in code models, categorizing risks into data security, model security, and execution security, along with governance measures like data auditing and static/dynamic testing [34][35]. Group 7: Training Methodologies - The latter part of the paper summarizes valuable training experiences, presenting a systematic methodology for training code models, which can serve as a reference for teams preparing to develop large code models [36][40]. Group 8: Accelerating Applications - The paper concludes by highlighting the acceleration of applications in software engineering, with code models increasingly integrated into key processes such as IDE plugins, collaborative coding, and automated testing [41][42]. - The future of software engineering is likely to evolve towards intention-driven, collaborative coding, with models playing an increasingly significant role [43].
谷歌最强大模型付费上线,在DeepSeek开源后被吐槽太贵
量子位· 2025-12-05 05:33
Core Viewpoint - Google has launched its latest model, Gemini 3 Deep Think, which significantly enhances reasoning capabilities, particularly in complex mathematics, science, and logic problems [2][9]. Group 1: Model Features and Performance - Gemini 3 Deep Think allows for iterative reasoning, enabling multi-round refinement of code to produce more detailed results in visualization, prototyping, and experimentation [9]. - The model achieved an accuracy of 41.0% on the Humanity's Last Exam benchmark, outperforming GPT-5 Pro by 10 percentage points [10]. - In the ARC-AGI-2 benchmark, the code execution accuracy reached 45.1%, surpassing Gemini 3 Pro by 14% and GPT-5.1 by nearly 30% [11]. - The underlying technology of this model is derived from Gemini 2.5 Deep Think, which has previously won gold medals in prestigious competitions like IMO and ICPC [14]. Group 2: User Reception and Pricing Concerns - Gemini 3 Deep Think is currently available only to Ultra members at a monthly fee of $249.9, approximately 1800 RMB, leading to dissatisfaction among Pro users who feel excluded [18]. - Users expressed frustration over the lack of trial access or pay-per-use options, questioning the value of the Ultra membership [21][22]. - The pricing strategy has resulted in a lukewarm reception for Gemini 3 Deep Think, with many users criticizing the high cost and lack of transparency regarding its value [24][27]. Group 3: Competitive Landscape - The recently updated DeepSeek-V3.2 has shown competitive reasoning capabilities, closely matching Gemini 3 Pro and winning accolades in major competitions, which poses a challenge to Gemini 3 Deep Think [25]. - DeepSeek is open-source, contrasting with Google's closed-source approach and high pricing, which has contributed to user dissatisfaction [26][27].
Ilya刚预言完,世界首个原生多模态架构NEO就来了:视觉和语言彻底被焊死
量子位· 2025-12-05 05:33
Core Insights - The AI industry is experiencing a paradigm shift, moving away from merely scaling models to focusing on smarter architectures, as highlighted by Ilya Sutskever's statement that the era of scaling laws is over [1][2][20]. - A new native multimodal architecture called NEO has emerged from a Chinese research team, which is the first scalable open-source model that integrates visual and language understanding at a fundamental level [4][19]. Group 1: Current State of Multimodal Models - The mainstream approach to multimodal models has relied on modular architectures that simply concatenate pre-trained visual and language components, leading to inefficiencies and limitations in understanding [6][8]. - Existing modular models face three significant technical gaps: efficiency, capability, and fusion, which hinder their performance in complex tasks requiring deep semantic understanding [14][15][17]. Group 2: NEO's Innovations - NEO introduces a unified model that inherently integrates visual and language processing, eliminating the distinction between visual and language modules [19]. - The architecture features three core innovations: Native Patch Embedding for high-fidelity visual representation, Native-RoPE for adaptive spatial encoding, and Native Multi-Head Attention for enhanced interaction between visual and language tokens [22][24][29][33]. Group 3: Performance and Efficiency - NEO demonstrates remarkable data efficiency, achieving competitive performance with only 3.9 million image-text pairs for training, which is one-tenth of what other leading models require [39]. - In various benchmark tests, NEO has outperformed other models, showcasing superior performance in tasks related to visual understanding and multimodal capabilities [41][42]. Group 4: Implications for the Industry - NEO's architecture not only enhances performance but also lowers the barriers for deploying multimodal AI in edge devices, making advanced visual perception capabilities accessible beyond cloud-based systems [43][45][50]. - The open-sourcing of NEO models signals a shift in the AI community towards more efficient and unified architectures, potentially setting a new standard for multimodal technology [48][49]. Group 5: Future Directions - NEO's design philosophy aims to bridge the semantic gap between visual and language processing, paving the way for future advancements in AI, including video understanding and 3D spatial perception [46][51]. - The emergence of NEO represents a significant contribution from a Chinese team to the global AI landscape, emphasizing the importance of architectural innovation over mere scaling [53][54].
华为新架构砍了Transformer大动脉!任意模型推理能力原地飙升
量子位· 2025-12-05 02:13
Core Viewpoint - The article discusses the limitations of the traditional Transformer architecture, particularly its Attention mechanism, and introduces a new architecture called Nexus, which employs a Higher-Order Attention Mechanism to enhance reasoning capabilities in complex tasks [1][2][4][7]. Group 1: Limitations of Traditional Transformer - The traditional Attention mechanism struggles with complex mathematical problems and multi-step logical reasoning, leading to inaccurate outputs [2][6]. - The core issue lies in the static nature of Query (Q) and Key (K) generation, which limits the model's ability to capture complex relationships [15][14]. Group 2: Introduction of Nexus - Huawei's Noah's Ark Lab has developed Nexus, which addresses the limitations of the traditional Attention mechanism by using higher-order attention to model complex relationships effectively [7][8]. - Experimental results indicate that models using Nexus show significant improvements in reasoning tasks without increasing parameters [10][35]. Group 3: Innovations in Nexus Architecture - Nexus innovates by making the generation of Q and K an attention operation itself, allowing tokens to aggregate contextual information before calculating Q and K [17][18]. - The architecture employs a recursive framework that supports multi-hop reasoning, enabling the construction of higher-order relationships [23][27]. - Nexus maintains parameter efficiency through weight-sharing strategies, ensuring that the model's complexity does not lead to an increase in parameter count [29][31]. Group 4: Performance Improvements - In experiments with the Pythia series models, Nexus consistently outperformed the original Transformer across various reasoning datasets, with notable improvements in tasks requiring multi-step reasoning [36][39]. - For instance, the accuracy of the 70M model on the SciQ dataset improved from 61.5% to 68.5%, a 7 percentage point increase [39]. Group 5: Application and Future Directions - Nexus demonstrates plug-and-play capabilities, allowing for easy integration into larger models without extensive retraining, thus enhancing reasoning abilities [41][44]. - The team plans to explore Nexus's applications in visual Transformers and multimodal models, indicating its potential beyond language tasks [45][46].