量子位

Search documents
 大模型推理学习新范式!ExGRPO框架:从盲目刷题到聪明复盘
 量子位· 2025-10-23 05:18
ExGRPO团队 投稿 量子位 | 公众号 QbitAI 大模型在强化学习过程中,终于知道什么经验更宝贵了! 来自上海人工智能实验室、澳门大学、南京大学和香港中文大学的研究团队,最近提出了 一套经验管理和学习框架ExGRPO —— 通过科学地识别、存储、筛选和学习有价值的经验,让大模型在优化推理能力的道路上,走得更稳、更快、更远。 实验结果显示,与传统的在线策略RLVR (基于可验证奖励的强化学习) 方法相比,ExGRPO在不同基准上均带来了一定程度的性能提升。 尤其在一些极具挑战性的任务 (如AIME数学竞赛题) 上,提升效果更为明显,证明了ExGRPO在攻克复杂推理难题上的有效性。 而且该研究也揭示了一些有趣的现象,比如滚雪球效应。 不过在展开之前,我们先来回答一个核心问题—— 大模型推理的下一步,为什么我们需要"经验驱动"的训练方法? 2025年初以来,赋能大模型推理能力的技术路线以基于可验证奖励的强化学习 (Reinforcement Learning from Verifiable Rewards) 为 主导。 简单来说,就是让模型像个学生一样,不断地"刷题" (生成推理步骤) ,然后由"判卷老师" ...
 顶会直聘!大厂ICCV现场玩出新模式,还是鹅会玩
 量子位· 2025-10-23 05:18
 Core Viewpoint - The article highlights the increasing trend of major tech companies, particularly Tencent, actively recruiting talent at academic conferences like ICCV 2025, showcasing their technological advancements while simultaneously seeking to attract top talent in the AI field [6][30][36].   Group 1: Recruitment Strategies - Major tech companies are shifting from merely showcasing research achievements at conferences to directly recruiting talent, as evidenced by Tencent's significant presence at ICCV 2025 [6][30]. - Tencent's exhibition at ICCV was the second largest, featuring numerous core business leaders engaging with students and discussing technical routes and job opportunities [8][11]. - The event allowed students to interact directly with business leaders, providing a unique opportunity to discuss job openings and gain insights into Tencent's various AI initiatives [34][36].   Group 2: Technological Showcase - Tencent showcased a wide array of AI technologies across its business units, including advancements in 3D generation and video synthesis, with multiple academic papers accepted at the conference [13][21]. - The company presented its latest innovations, such as the Hunyuan 3D generation project and real-time digital human solutions, attracting significant attention from attendees [15][20]. - The presence of Tencent's top researchers at the conference facilitated engaging discussions, enhancing the visibility of their technological capabilities [27][29].   Group 3: Market Position and Future Outlook - Tencent's investment in AI research is substantial, with R&D spending projected to reach 39.16 billion RMB in the first half of 2025, reflecting a year-on-year growth of 21% and 17% in the first two quarters [43]. - The company is positioned to leverage its extensive user base and diverse business applications to convert technological advancements into market opportunities [45][46]. - The recruitment efforts at conferences like ICCV are part of a broader strategy to secure top talent, which is essential for maintaining a competitive edge in the rapidly evolving AI landscape [39][40].
 中国模型打服硅谷:Airbnb联创CEO感叹又快又好又便宜!把ChatGPT合作都拒了
 量子位· 2025-10-23 03:52
 Core Insights - The article highlights the growing recognition of Chinese AI models in the global market, particularly emphasizing their cost-effectiveness, efficiency, and stability compared to Western counterparts [4][6][10].   Group 1: Airbnb's Adoption of Chinese AI Models - Airbnb's CEO Brian Chesky publicly stated that the company heavily relies on Alibaba's Qwen model due to its speed and affordability, while expressing concerns about the readiness of OpenAI's tools for integration [2][11]. - The implementation of an AI customer service agent at Airbnb led to a 15% reduction in reliance on human customer service, with average problem resolution time decreasing from nearly three hours to just six seconds [8][10]. - The success of Qwen in this context illustrates the competitive edge of Chinese AI models in real-world applications [4][10].   Group 2: Performance of Chinese AI Models - Kimi K2 has been praised for outperforming leading closed-source models like GPT-5 and Claude Sonnet-4.5, being five times faster and achieving 50% higher accuracy [16][20]. - Kimi K2's rapid processing time of just two minutes with over 60% accuracy contrasts sharply with GPT-5's ten-minute runtime and less than 40% accuracy [20]. - DeepSeek has gained significant attention, with its R1 model being recognized on the cover of Nature, marking a notable achievement in the field of large language models [22][29].   Group 3: Innovations and Global Impact - The introduction of new attention mechanisms and open-source models by DeepSeek has sparked widespread interest and discussion within the industry [24][26]. - The article concludes that Chinese models are paving a unique path in the global AI competition, characterized by their openness and rapid development [29][30].
 太疯狂了!Meta裁员裁到田渊栋头上,连组员一锅端
 量子位· 2025-10-23 03:52
 Core Viewpoint - The recent layoffs at Meta AI, led by the new Chief AI Officer Alexander Wang, are not merely organizational streamlining but indicate a significant shift in the company's AI strategy, impacting prominent figures like Tian Yuandong, who has been with Meta for over a decade [1][6].   Group 1: Tian Yuandong's Background and Contributions - Tian Yuandong has a strong academic background with degrees from Shanghai Jiao Tong University and a PhD from Carnegie Mellon University, specializing in robotics [7][8]. - He joined Facebook (now Meta) in 2014 and has made significant contributions to AI, including the development of the Go AI "Dark Forest," which achieved a level comparable to top amateur players before AlphaGo [9][12]. - His research focus shifted towards AI interpretability and foundational principles, rejecting an invitation from OpenAI to work on language models to concentrate on understanding neural network operations [13].   Group 2: Recent Developments and Innovations - Recently, Tian Yuandong led a team focused on planning and reasoning within AI, publishing a paper on the role of key hyperparameters in "Grokking" and the effectiveness of optimizers like Muon [14][15]. - His innovative work includes memory-efficient training methods like GaLore, which compresses the memory required for pre-training a 7B model to under 24GB, enabling training on consumer-grade GPUs [16]. - The Dualformer model integrates "fast thinking" and "slow thinking" processes, allowing dynamic responses to simple and complex problems, while the Coconut paradigm compresses reasoning trajectories into a continuous latent space [16].   Group 3: Industry Reactions and Future Prospects - Following the layoffs, companies like OpenAI and various startups quickly expressed interest in recruiting Tian Yuandong and his team members, indicating a competitive job market in the AI sector [4][6]. - Tian Yuandong's experiences in the workplace may inspire his creative endeavors, as he is also a science fiction author, with his first novel set to be published in 2024 [17][20].
 一个指令误导智能模型!北航等首创3D语义攻击框架,成功率暴涨119%
 量子位· 2025-10-23 03:52
 Core Viewpoint - The article discusses the security alignment issues of artificial intelligence models, particularly focusing on the newly proposed InSUR framework for generating adversarial samples that are independent of specific tasks and models [1][2].   Group 1: InSUR Framework Overview - The InSUR framework is based on the concept of instruction uncertainty reduction, allowing for the generation of adversarial samples that mislead both known and unknown models with just one instruction [2][4]. - The framework integrates a 3D generation approach, achieving the first-ever generation of natural 3D adversarial objects through a single instruction, validating the effectiveness of the introduced sampling technique (ResAdv-DDIM) [6][8].   Group 2: Challenges in Semantic Adversarial Sample Generation - The existing methods for generating semantic adversarial samples face three main challenges: referring diversity, description incompleteness, and boundary ambiguity [14][21]. - InSUR addresses these challenges through a combination of stable attack direction driven by residuals, rule encoding for the generation process, and semantic hierarchical abstraction evaluation methods [8][12].   Group 3: Sampling Method and Task Modeling - The ResAdv-DDIM sampling method stabilizes the attack direction by predicting a rough outline of the final target during the denoising process, which enhances the robustness and transferability of adversarial samples [12][16]. - Task modeling is achieved by incorporating task goal embedding strategies, enabling effective generation of both 2D and 3D semantic adversarial samples [22][27].   Group 4: Evaluation and Results - The evaluation of the InSUR framework shows significant improvements in attack success rates (ASR) across various models and tasks, with an average ASR increase of at least 1.19 times and a minimum ASR increase of 1.08 times while maintaining low perceptual loss (LPIPS) [40][41]. - The framework's design decouples it from specific models and tasks, demonstrating scalability and effectiveness in generating high-fidelity adversarial test scenarios for safety-critical systems [45][46].
 不改模型也能提升推理性能?ICLR投稿提出测试时扩展新范式OTV
 量子位· 2025-10-23 00:08
 Core Insights - The article discusses the challenges faced by large language models, including hallucinations, logical errors, and reasoning flaws, prompting researchers to explore new methods to enhance output reliability [1] - A novel approach called One-Token Verification (OTV) is introduced, which allows models to monitor their reasoning process in real-time without altering the original model structure or parameters [2]   Summary by Sections   Current Mainstream Paradigms - LoRA fine-tuning is highlighted as a popular parameter-efficient tuning method that avoids full parameter training and is easy to deploy, but it often relies on detailed supervised data and can lead to "forgetting effects" [3] - Quality screening of generated results can enhance output credibility but tends to be reactive, making it difficult to correct the model's reasoning in real-time and lacking insight into the internal reasoning process [4]   Parallel Thinking Framework - The article introduces the concept of Parallel Thinking, which allows language models to generate multiple reasoning paths simultaneously and then filter them through a specific mechanism [5] - OTV builds on this framework by focusing on efficiently selecting correct reasoning paths at a lower cost rather than generating multiple paths [5]   OTV Mechanism - OTV employs an internal verifier that analyzes the reasoning process using a lightweight role vector implemented via LoRA, running in parallel with the original model [9] - The internal verifier utilizes the key-value cache (KV Cache) from the Transformer architecture to capture rich information about the model's internal dynamics during the reasoning process [9] - A special token, referred to as "Token of Truth" (ToT), is inserted during the verification phase to assess the correctness of the reasoning path [9]   Training and Efficiency - OTV's internal verifier is designed to be lightweight, with a training logic that assigns heuristic pseudo-labels based on the correctness of the final answer [10] - The training process is highly parallelized, allowing simultaneous scoring predictions for all positions, making it computationally comparable to traditional LoRA fine-tuning [10]   Experimental Validation - OTV was systematically evaluated on various open-source models, demonstrating superior accuracy and a preference for shorter, more accurate reasoning paths compared to baseline methods [14] - The results indicate that OTV can read the internal reasoning state and output quality, significantly outperforming general methods that rely solely on output text [15]   Dynamic Control of Computational Costs - OTV enables models to dynamically control computational expenses by real-time elimination of low-quality paths based on confidence scores, leading to a reduction in computational load by nearly 90% while maintaining optimal accuracy [17]   Future Prospects - The OTV framework opens avenues for deeper integration with original models and the potential for a three-state system that includes "uncertain" states, enhancing selective prediction capabilities [25][26] - The approach could also be extended to different model architectures, optimizing KV cache structures to further improve reasoning efficiency and representation utilization [26]
 Meta AI大裁600人,亚历山大王操刀重点砍向LeCun团队
 量子位· 2025-10-23 00:08
 Core Viewpoint - Meta is undergoing significant layoffs in its AI division, with 600 employees being cut, particularly affecting the FAIR lab and AI product departments, while the newly established TBD Lab remains unaffected and continues to hire [1][2][5].   Group 1: Layoffs and Organizational Changes - The layoffs are part of a restructuring effort led by the new Chief AI Officer, Alexander Wang, who aims to create a more agile operational model within Meta AI [5][7]. - Employees were informed about their job status by Wednesday morning, Pacific Time, indicating a swift decision-making process [6]. - Wang's internal memo emphasized the need for fewer discussions in decision-making and encouraged affected employees to apply for other positions within the company [8].   Group 2: Leadership and Research Direction - CEO Mark Zuckerberg has expressed deep concerns regarding the lack of breakthroughs or performance improvements in Meta AI, which has driven the decision for layoffs [8]. - Yann LeCun, head of the FAIR lab, has distanced himself from the Llama project and expressed frustration over new policies requiring additional reviews for research papers, which he views as a threat to academic freedom [9][10][11].   Group 3: Talent Acquisition and Future Outlook - TBD Lab is actively recruiting talent, having recently hired key personnel from Thinking Machines and OpenAI, indicating a strategic focus on building a strong team for future AI developments [2]. - Despite the layoffs, Wang remains optimistic about the models being trained and the overall direction towards achieving superintelligence [8].
 用激光给芯片散热,摩尔定律天花板盖不住了
 量子位· 2025-10-23 00:08
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 风冷、液冷OUT!芯片散热有了新方法——用 激光 。 现在热量不仅仅可以"移动",还能直接"消失"。 嘶,听起来是不是有点反物理学常识?却正在成为可能。 初创公司Maxwell Labs最新提出的 光子冷却 方法,可以直接将热量转化成光,并在芯片内部去除。 其中由于吸收的能量远高于发射能量,能量的差异往往导致材料升温,但换言之,如果采取一定手段,吸收的是低能光、发射的变成高能光, 那么材料就会降温。 而这种现象在物理学中被称为 反斯托克斯冷却 :对于照射在特殊材料上的窄范围激光,离子可以有效地吸收入射光,并结合材料晶格振动 (声子) 触发发射更高能量的光。 不过期间需要注意尽快让发射光逸出,否则会被再次吸收,导致温度回升。 借助该原理,Maxwell Labs将其集成到 薄膜芯片级 光子冷板上,从而实现芯片的光子冷却。 无需再像传统方法一样整块芯片降温,光子冷却精确瞄准芯片热点,效率提升好几个level。 而且该技术一旦彻底成熟,不仅单颗芯片的可用功率实现大幅提升、3D芯片堆叠散热问题得以解决,甚至也将有利于建立大规模数据中心。 言归正传,具体细节如下:  ...
 让LLM扔块石头,它居然造了个投石机
 量子位· 2025-10-22 15:27
 Core Insights - The article discusses a new research platform called BesiegeField, developed by researchers from CUHK (Shenzhen), which allows large language models (LLMs) to design and build functional machines from scratch [2][39] - The platform enables LLMs to learn mechanical design through a process of reinforcement learning, where they can evolve their designs based on feedback from physical simulations [10][33]   Group 1: Mechanism of Design - The research introduces a method called Compositional Machine Design, which simplifies complex designs into discrete assembly problems using standard parts [4][5] - A structured representation mechanism, similar to XML, is employed to facilitate understanding and modification of designs by the model [6][7] - The platform runs on Linux clusters, allowing hundreds of mechanical experiments simultaneously, providing comprehensive physical feedback such as speed, force, and energy changes [9][10]   Group 2: Collaborative AI Workflow - To address the limitations of single models, the research team developed an Agentic Workflow that allows multiple AIs to collaborate on design tasks [23][28] - Different roles are defined within this workflow, including a Meta-Designer, Designer, Inspector, Active Env Querier, and Refiner, which collectively enhance the design process [28][31] - The hierarchical design strategy significantly outperforms single-agent or simple iterative editing approaches in tasks like building a catapult and a car [31]   Group 3: Self-Evolution and Learning - The introduction of reinforcement learning (RL) through a strategy called RLVR allows models to self-evolve by using simulation feedback as reward signals [33][34] - The results show that as iterations increase, the models improve their design capabilities, achieving better performance in tasks [35][37] - The combination of cold-start strategies and RL leads to optimal scores in both catapult and car tasks, demonstrating the potential for LLMs to enhance mechanical design skills through feedback [38]   Group 4: Future Implications - BesiegeField represents a new paradigm for structural creation, enabling AI to design not just static machines but dynamic structures capable of movement and collaboration [39][40] - The platform transforms complex mechanical design into a structured language generation task, allowing models to understand mechanical principles and structural collaboration [40]
 刚拿诺奖就登Nature封面!谷歌“量子回声”算法计算提速13000倍,可重复验证结果
 量子位· 2025-10-22 15:27
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 刚获得诺贝尔物理奖的谷歌量子团队,再登Nature封面: 提出 "Quantum Echoes"(量子回声) 新算法,算出来的结果还能重复验证,解决了之前量子计算结果难确认的问题。 经典超级计算机Frontier需要 3.2年 才能完成的计算,量子计算机仅用 2.1小时 就搞定,速度快了 13000倍 。 论文刚刚登上Nature,新晋诺奖得主、现任谷歌量子AI实验室硬件首席科学家 Michel Devoret 参与,还包括来自普林斯顿大学、加州大学 伯克利分校、MIT等顶尖院校的研究人员,总计超过200位作者参与了这项研究。 在另一项研究中(稍后将上传到arXiv),新算法在探测原子和粒子的相互作用以及分子的结构中得到验证。 量子计算机得出的结果与传统核磁共振(NMR)的结果相符,并且揭示了通常无法从核磁共振中获得的信息。 正如望远镜和显微镜打开了新的世界的大门一样,这项实验朝着 "量子镜" 迈出关键一步,能够测量以前无法观测到的自然现象 量子计算增强的核磁共振技术有望成为药物研发领域的强大工具,助力确定潜在药物如何与其靶点结合;在材料科学领域,它也能用 ...










