Workflow
量子位
icon
Search documents
陈丹琦新作:大模型强化学习的第三条路,8B小模型超越GPT-4o
量子位· 2025-09-28 04:56
Core Viewpoint - The article discusses a new method called RLMT (Reinforcement Learning with Model-rewarded Thinking) that combines the advantages of RLHF (Reinforcement Learning from Human Feedback) and RLVR (Reinforcement Learning with Verifiable Rewards), enabling an 8 billion parameter model to outperform GPT-4o and rival Claude-3.7-Sonnet [1][4][11]. Group 1: Methodology and Performance - RLMT requires the model to generate a Chain of Thought (CoT) before producing an answer, which is then evaluated by a reward model trained on human preferences [5][17]. - The method can be directly applied to base models without the need for supervised fine-tuning (SFT), significantly reducing post-training costs [6][22]. - In benchmark tests, the L3.1-8B-RLMT model achieved an average score of 84.3, surpassing larger models like GPT-40 and Claude3.7-Sonnet [7]. Group 2: Training Process - The training process involves generating a reasoning trajectory based on user prompts, followed by scoring the final answer using a reward model [14]. - Two training approaches are highlighted: Warm-start (using SFT data) and Zero (direct training without SFT), both leading to improved performance [21][19]. - The RLMT method enhances the model's reasoning style to resemble human thought processes, resulting in higher quality dialogue and writing [19]. Group 3: Implications and Future Directions - The introduction of RLMT sets a new baseline for general reinforcement learning, emphasizing the importance of defining preferences in the post-training era [8]. - The results indicate that smaller models can achieve superior performance compared to larger models, suggesting a shift in focus towards efficiency in model training [22]. - The research team, led by Chen Danqi, aims to further explore natural language understanding and reasoning capabilities in future studies [24][25].
奥特曼和量子计算奠基人讨论GPT-8
量子位· 2025-09-28 03:39
Core Viewpoint - The dialogue between Sam Altman and David Deutsch highlights the ongoing debate about whether AI can evolve into a conscious superintelligence, with differing opinions on the definitions and standards of AGI (Artificial General Intelligence) and ASI (Artificial Superintelligence) [3][8]. Group 1: Discussion on AI and Consciousness - Altman believes that future iterations of AI, such as GPT-8, could potentially understand complex concepts like quantum gravity and explain their reasoning process, challenging Deutsch's skepticism about AI achieving consciousness [22]. - Deutsch argues that while AI can perform impressive tasks, it lacks the intrinsic qualities of human intelligence, such as intuition and the ability to create original ideas, which are essential for true AGI [11][12][18]. Group 2: Perspectives on Human Intelligence - The conversation emphasizes that human intelligence is characterized by the ability to narrate one's own story and actively choose motivations, contrasting with the mechanical processing of information seen in current AI systems [19][21]. - The notion that there is no definitive test for AGI is discussed, suggesting that existing methods cannot adequately measure the capabilities of a truly general intelligence [15][16]. Group 3: Contributions of David Deutsch - David Deutsch is recognized as a foundational figure in quantum computing and information theory, having proposed significant theoretical frameworks that underpin the field [23][24]. - His work includes the development of the Deutsch-Jozsa algorithm, which demonstrated the exponential speedup of quantum algorithms compared to classical ones, laying the groundwork for future advancements in quantum computing [26].
DeepMind率先提出CoF:视频模型有自己的思维链
量子位· 2025-09-28 03:39
Core Viewpoint - DeepMind introduces the concept of Chain-of-Frames (CoF) for video models, paralleling the Chain-of-Thought (CoT) in language models, suggesting a shift towards general-purpose visual understanding capabilities in machine vision [1][3][28]. Group 1: Introduction of CoF - The CoF concept arises from the curiosity of whether video generation models can achieve general-purpose capabilities similar to large language models (LLMs) without specialized training [6][7]. - The goal is to validate the hypothesis that video models can perform various visual tasks using a single underlying logic based on vast data [7][8]. Group 2: Capabilities of Veo 3 - Veo 3 demonstrates four progressive capabilities: 1. It can handle many classic visual tasks without specialized training, showcasing perceptual abilities [10][11]. 2. It can establish rules of the visual world, indicating modeling capabilities [13][14]. 3. It can perform creative modifications and simulations, reflecting operational abilities [16]. 4. It can achieve cross-temporal visual reasoning, embodying the CoF concept [18][21]. Group 3: Performance Analysis - Analysis of 62 qualitative tasks and 7 quantitative tasks revealed that Veo 3 can solve many tasks it has not been specifically trained for, indicating its general potential [23]. - The performance of Veo 3 shows significant improvement over its predecessor, Veo 2, suggesting rapid development in video model capabilities [24][25]. Group 4: Future Outlook - DeepMind predicts that general-purpose models like Veo 3 will eventually replace specialized models in the video domain, similar to the evolution seen in LLMs [25][26]. - The cost of video generation is currently higher than specialized models, but it is expected to decrease over time, paralleling trends observed in LLMs [25][26].
AI原生产品不等于全部功能AI化,保留传统功能让用户体验更完整 | 对话小卡健康
量子位· 2025-09-27 09:58
Core Insights - The article discusses the competitive landscape of AI health management products, highlighting the emergence of various products tailored for health management in everyday life, with a focus on user experience and efficiency [3][4]. - It emphasizes the differentiation strategies of AI-native products like XiaoKa Health, which positions itself as a personal AI nutritionist, utilizing AI for food recognition and personalized nutrition plans [3][8]. Market Overview - The AI health management market is characterized by high consumer engagement and a fragmented landscape, where product differentiation is relatively weak, primarily focusing on "recording + personalized plan customization" [3][4]. - XiaoKa Health has over 1 million users and offers three core functionalities: AI food image recognition for calorie and nutrient calculation, semantic recognition of dietary and exercise behaviors, and personalized nutrition plans [8][41]. Product Features and Innovations - XiaoKa Health enhances user data recording efficiency through AI technologies, allowing users to upload food images for quick calorie and nutrient information, significantly reducing the time required for manual entry [11][12]. - The product also features a semantic recording function, enabling users to describe their health data in one sentence, which is then automatically logged into the app, streamlining the recording process [12][20]. User Engagement and Retention - The company focuses on increasing user stickiness through personalized feedback and emotional support, allowing users to customize their AI assistant's persona, which enhances user interaction and retention [32][33]. - Continuous user feedback is prioritized, with the company actively engaging with users to refine product features and ensure a high-quality user experience [58][60]. Competitive Strategy - XiaoKa Health aims to maintain a competitive edge by deepening its core functionalities and exploring innovative combinations of AI technologies, such as integrating water intake tracking with calorie recording [51][52]. - The company believes that while larger firms may replicate their products, their agility and ability to respond quickly to user needs provide a significant advantage in the market [53][54]. Future Directions - The vision for XiaoKa Health is to evolve into a comprehensive personal AI nutritionist that understands users deeply and provides continuous support throughout their health journey [54][56]. - The company is exploring new paradigms for data interaction and aims to achieve seamless, "no-sense" recording through smart devices, enhancing user convenience [35][36].
暴走东京电玩展,Game Show也AI上了
量子位· 2025-09-27 07:00
Core Viewpoint - The article highlights the significant presence and influence of Chinese companies at the Tokyo Game Show (TGS), showcasing advancements in AI technology and its integration into the gaming industry [1][36]. Group 1: Chinese Companies at TGS - Major Chinese gaming companies such as NetEase, Tencent, and others have established impressive exhibition spaces, attracting numerous players [2][8]. - AI companies are also making their mark at TGS, demonstrating their capabilities and innovations in the gaming sector [8][10]. Group 2: AI Technology Showcase - Alibaba's booth prominently featured its open-source models, including Tongyi Qianwen and Tongyi Wanxiang, offering a range of commercial solutions from IaaS to SaaS [11][12]. - The Model Studio platform and AI development platform PAI were highlighted as part of Alibaba's offerings, indicating a strong push for AI integration in gaming [13][15]. Group 3: 3D Generation Technology - Tencent Cloud emphasized its cloud computing capabilities for game security and operations, while also discussing the potential of mixed reality 3D technology [21][22]. - VAST's Tripo, a leading open-source 3D generation project, is gaining attention from game developers both domestically and internationally [26][27]. Group 4: AI Applications in Gaming - HakkoAI, an AI gaming companion, showcased its ability to understand and interact with various games, outperforming several top general models in specific gaming scenarios [34]. - The integration of AI in gaming is creating new possibilities and enhancing player experiences, indicating a growing trend in the industry [36].
让RAG真正读懂“言外之意”!新框架引入词汇多样性,刷新多项基准SOTA
量子位· 2025-09-27 07:00
Core Insights - The article discusses the introduction of the Lexical Diversity-aware RAG (DRAG) framework, which enhances the accuracy of Retrieval-Augmented Generation (RAG) models by 10.6% and sets new state-of-the-art (SOTA) results in multiple benchmarks [1][2][16]. Group 1: Framework and Innovations - The DRAG framework systematically incorporates lexical diversity into the retrieval and generation processes of RAG, providing a lightweight, general, and easily extensible solution [1][5]. - The research team from Beihang University, Peking University, and Zhongguancun Laboratory highlights the importance of lexical diversity, which has been largely overlooked in existing RAG methods [4][5]. - Two key innovations are introduced: 1. Diversity-sensitive Relevance Analyzer (DRA), which dissects query semantics and employs differentiated strategies for various components, leading to a more granular relevance scoring [9]. 2. Risk-guided Sparse Calibration (RSC), which monitors the "misleading risk" of each generated token and calibrates decoding as necessary, ensuring the generation phase is not disturbed by irrelevant information [11][14]. Group 2: Performance and Results - The DRAG framework has shown significant performance improvements across various open-domain question-answering benchmarks, with notable accuracy increases in PopQA and TriviaQA by 4.9% and 4.4%, respectively, and a 10.6% increase in HotpotQA and 2WikiMultiHopQA [16]. - The method also outperforms existing models in long-answer generation metrics such as str-em and QA-F1, demonstrating strong generalization capabilities across different model sizes, including Llama2-7B and Llama2-13B [18][16]. Group 3: Lexical Diversity Challenges - The article identifies lexical diversity as a critical yet often neglected issue in RAG methods, where different expressions of the same question can confuse retrieval models, leading to incorrect answers [5][8]. - The framework addresses this by allowing semantic flexibility for variable components while ensuring strict matching for invariant components, thus improving the relevance of retrieved documents [12]. Group 4: Future Directions - The research team plans to expand the application of the DRAG framework to more specialized scenarios, aiming to enhance the understanding of complex human language expressions in large models [5].
翁荔陈丹琦加盟的840亿AI公司,公开第二篇论文
量子位· 2025-09-27 04:46
Core Viewpoint - The article discusses the recent research paper by Thinking Machines, led by Jeremy Bernstein, focusing on "Modular Manifolds" to enhance the stability and efficiency of neural network training through a unified framework for different layers/modules [1][2]. Group 1: Research Motivation and Challenges - The research aims to address fundamental challenges in neural network training, particularly issues related to tensor values (weights, activations, gradients) that can lead to instability, gradient explosion/vanishing, and low training efficiency [2]. - The author proposes a new optimization approach called Modular Manifolds, which applies constraints not only to individual weight tensors but also views the entire network as a composite manifold structure [2][8]. Group 2: Importance of Manifold Constraints - The necessity for manifold constraints arises from the instability encountered during the training of large models, where extreme values of weights, activations, or gradients can lead to issues like overflow, disappearance, and slow convergence [8]. - Normalization methods have been the gold standard for addressing these issues, but there has been little focus on normalizing the weight matrices themselves [8][9]. Group 3: Benefits of Weight Normalization - Normalizing weight matrices can lead to more stable training, easier adjustments, predictable behavior, and greater resistance to external disturbances [9][10]. Group 4: Research Process Overview - The research process includes several steps, starting with a basic example of training a parameter vector constrained to a unit sphere [11]. - The author discusses the challenges of using standard optimization methods like Adam or SGD, which may lead to updates that exit the constraint space [12][13]. Group 5: Manifold Optimization Techniques - The manifold optimization approach involves projecting gradients onto the tangent space, updating parameters, and then retracting the updated vector back onto the manifold [14]. - The choice of manifold constraints and measurement of lengths can lead to the creation of various optimization algorithms [16]. Group 6: Extension to Matrix Parameters - The research extends the concept from vector parameters to matrix parameters, particularly for the weight matrices in Transformers, which can have thousands of dimensions [17]. - The Stiefel manifold is proposed for matrix parameters, ensuring orthogonality of column vectors and a condition number of 1, which aids in numerical stability [18][20]. Group 7: Experimental Validation - A small-scale experiment was conducted on the CIFAR-10 dataset, comparing the manifold Muon algorithm with AdamW, showing that the former slightly outperformed the latter in training/testing accuracy, although it was slower in execution time [23][24]. Group 8: Modular Manifolds Concept - The concept of Modular Manifolds is introduced, treating each layer or module of the neural network as a separate manifold with its own defined norms and optimization methods [26][27]. - These individual manifolds can be combined into a larger manifold space, where a global mechanism constrains the overall update process while allowing local updates [29][30]. Group 9: Future Implications - The proposed methodology emphasizes the design coupling of the entire model training process, suggesting that successful application on large Transformers or LLMs could significantly enhance training efficiency and stability [31][32]. - The company has already achieved a valuation exceeding $12 billion, indicating strong market expectations for its research outcomes [52].
业界首个高质量原生3D组件生成模型来了!来自腾讯混元团队
量子位· 2025-09-27 04:46
腾讯混元3D团队 投稿 量子位 | 公众号 QbitAI 业界首个高质量原生3D组件生成模型来了! 来自腾讯混元3D团队。 现有的3D生成算法通常会生成一体化的3D模型, 而下游应用通常需要语义可分解的3D形状 ,即3D物体的每一个组件需要单独地生成出来。 类似以下视频所演示的那样: 一般来说, 组件式3D生成 主要有2个应用场景: 1) 视频游戏制作管线 : 在游戏中, 很多资产是要根据语意信息将其绑定不同的游戏逻辑, 比如,汽车模型应该能够被分解为主体和四个 可滚动的轮子, 这样轮子是可以单独滚动起来的。所以组件拆分很重要。 与此同时,3D几何生成的下游链路,包括低模拓扑,UV展开等模块。这些模块处理很复杂的几何会变得困难,通过将复杂几何进行拆分简单 的小组件,这种分而治之的策略,可以大大降低下游算法的处理难度。 2)3D打印 : 这对3D打印行业也是不错的消息, 用户可以把组件一个一个打印出来然后再组装,像搭积木一样。 最后,由 X-Part 将整体形状分解为各个部件。 △ 图1. Hunyuan3D-Part组件拆分整体流程 其技术亮点在于: 1)提出了业界首个原生3D分割模型P3-SAM, 利用大规 ...
首款推理具身模型,谷歌DeepMind造!自主理解/规划/执行复杂任务,打破一机一训,还能互相0样本迁移技能
量子位· 2025-09-27 04:46
Core Viewpoint - Google DeepMind has launched the Gemini Robotics 1.5 series, marking a significant milestone in the development of general AI for real-world applications, featuring embodied reasoning capabilities that allow robots to "think before acting" [1][9]. Group 1: Model Composition - The Gemini Robotics 1.5 series consists of two main models: GR 1.5 for action execution and GR-ER 1.5 for embodied reasoning [2][8]. - GR-ER 1.5 is the world's first embodied model with simulated reasoning capabilities [3]. Group 2: Functional Capabilities - The combination of GR-ER 1.5 and GR 1.5 enables robots to perform complex multi-step tasks, such as sorting clothes by color or packing luggage based on weather conditions [5][6]. - GR 1.5 can adapt to various robot hardware, allowing a single model to operate across different platforms without the need for separate training [16][18]. Group 3: Motion Transfer Mechanism - The innovative "Motion Transfer" mechanism allows skills learned on one robot to be transferred to another, enhancing cross-platform functionality [21][48]. - This mechanism abstracts different robot actions into a unified semantic space, enabling seamless skill sharing across diverse hardware [56]. Group 4: Safety and Explainability - The GR 1.5 series enhances safety by allowing robots to self-correct during tasks and recognize potential risks, ensuring safe operation in human environments [34][36]. - The embodied reasoning model provides transparency in the robot's decision-making process, improving interpretability and trust [55][58]. Group 5: Performance Metrics - In benchmark tests, GR 1.5 outperformed previous models in various dimensions, including instruction generalization and task completion rates, achieving nearly 80% in long-sequence tasks [61][62]. - The model demonstrated unprecedented zero-shot transfer capabilities in cross-robot migration tests [63]. Group 6: Future Developments - The GR 1.5 series represents a shift from executing single commands to genuinely understanding and solving physical tasks [69]. - Currently, developers can access GR-ER 1.5 through Google AI Studio, while GR 1.5 is available to select partners [71].
大模型“精细化”对齐,真实性提升25.8%刷新SOTA!token级精准编辑,无需训练即插即用
量子位· 2025-09-27 04:46
Core Insights - The article discusses a new method called Token-Aware Editing (TAE) that enhances the alignment capabilities of large language models (LLMs), achieving a 25.8% improvement in truthfulness metrics on the TruthfulQA task, setting a new performance benchmark [1][15]. Group 1: Methodology - TAE is a token-aware reasoning representation editing method that addresses the limitations of traditional representation editing techniques, requiring no training and being plug-and-play applicable across various scenarios such as dialogue systems and content moderation [1][3]. - Existing methods often overlook the misalignment differences between tokens, leading to biased alignment directions and inflexible editing strengths [4][6]. - TAE consists of two main modules: Mutual Information-guided Graph Aggregation (MIG) and Misalignment-aware Adaptive Intervention (MAI) [8][10]. Group 2: Module Details - MIG enhances the representation capability of activation values to find more accurate editing directions by addressing information loss and local understanding limitations inherent in traditional methods [10]. - MAI calculates adaptive editing strengths for each token based on its misalignment risk, allowing for differentiated intervention levels that prevent over-correction of safe tokens and under-correction of dangerous tokens [11][12]. Group 3: Experimental Results - TAE significantly outperformed existing methods in various metrics, achieving a True*Info score of 87.8% on the TruthfulQA dataset, surpassing the previous best method (SEA) by 14.6 percentage points and the original baseline by 25.8 percentage points [14][15]. - In toxicity reduction tasks, TAE reduced the toxicity probability from a baseline of 0.41 to 0.05, a nearly 90% decrease, outperforming all specialized de-toxification baseline methods [16]. - TAE also demonstrated substantial improvements in fairness tasks, lowering stereotype scores from a baseline of 64.8% to 50.3%, approaching the ideal unbiased state [16]. Group 4: Broader Implications - The TAE method shows significant gains across various model types and sizes, including Llama2-7B-Chat, Llama2-13B-Chat, Alpaca-7B, and Mistral-7B, indicating its versatility and effectiveness in enhancing model alignment [17].