量子位
Search documents
Meta超级智能实验室新论文陷争议!被指忽略大量前人研究
量子位· 2025-09-12 00:59
Core Viewpoint - Meta's Super Intelligence Lab (MSL) faces controversy over its second paper titled "Language Self-Play For Data-Free Training," which has been criticized for neglecting prior research and lacking innovation [2][25]. Summary by Sections Overview of the Paper - The core idea of the paper is to utilize a method called Language Self-Play (LSP) to enable large language models to self-improve without additional training data [3][4]. - LSP addresses the challenge of large language models' heavy reliance on extensive, high-quality training data, which is often limited [4]. Methodology - LSP designs the learning process as a game framework where the same language model plays two roles in opposition, allowing for data-free training [5]. - In this adversarial process, the challenger generates increasingly difficult questions or commands to lower the expected rewards of the resolver, who must understand and respond to maximize their own rewards, akin to a minimax game [7]. - Unlike traditional adversarial training, LSP allows a single language model to act as both "challenger" and "resolver," using a special "Challenger Prompt" to switch roles [8]. Implementation and Challenges - The research introduces a reinforcement learning technique called GRPO to convert the game into a model training process [9]. - A reward mechanism is established where the challenger's questions target the resolver's weaknesses, driving continuous improvement [10]. - The method is termed Language Self-Play Zero (LSP-Zero), indicating a zero-sum nature [11]. - However, LSP-Zero can sometimes degrade, leading the model to generate meaningless content that scores high due to reward hacking [12]. Enhancements - To mitigate this issue, the researchers incorporated a "self-quality reward" (RQ) into the LSP algorithm, guiding the game towards high-quality interactions for sustainable training [13]. Experimental Results - Experiment 1 compared LSP and LSP-Zero with a traditional data-driven model, showing that LSP methods performed comparably to data-driven approaches and significantly outperformed the original model [18]. - In a dialogue and open instruction dataset, LSP's performance exceeded that of GRPO [18]. - Experiment 2 further trained a model using LSP-Zero and LSP, resulting in an increase in overall win rates from 40.9% to 43.1% [21]. - LSP demonstrated particularly notable improvements on the Vicuna dataset, indicating its effectiveness in continuing to unlock model potential post data-driven training [22][24]. Criticism and Response - Critics argue that MSL's work overlooks significant prior research, with various researchers having conducted similar studies without proper citation [25][26]. - The paper has been described as potentially rehashing older work, raising questions about its originality [30]. - As of now, MSL and the authors have not responded to these criticisms [31].
姚顺雨离职OpenAI,开启下半场
量子位· 2025-09-12 00:59
Core Viewpoint - The article discusses the career transition of Shunyu Yao, a prominent researcher from OpenAI, as he embarks on a new phase in the AI field, focusing on personal AI and the evolving landscape of AI development, which is now entering its "second half" [2][47]. Group 1: Background and Achievements - Shunyu Yao, a 29-year-old researcher, has an impressive academic background, including graduating from Tsinghua University and obtaining a PhD from Princeton, where he focused on natural language processing and reinforcement learning [4][22]. - His notable contributions to AI include the development of frameworks like Tree of Thoughts, SWE-bench, and ReAct, which enhance the reasoning and decision-making capabilities of language models [6][36]. Group 2: Career Transition - Yao's departure from OpenAI has been confirmed through various channels, and he is rumored to be considering entrepreneurship or joining another tech giant [3][51]. - His recent work emphasizes the shift in AI development from model-centric approaches to defining meaningful tasks and evaluating AI systems' performance in real-world scenarios [47][48]. Group 3: Philosophical Insights - Yao's approach to research is characterized by a cross-disciplinary perspective, drawing inspiration from various fields, which he believes is essential for innovation in AI [9][20]. - He advocates for the importance of language as a medium for reasoning and decision-making in AI, highlighting its role in enabling agents to generalize across different contexts [28][30].
万万没想到,大学生都开始拿AI来养猪了
量子位· 2025-09-11 10:19
Core Insights - The article highlights the increasing integration of AI tools in the daily lives of university students in China, showcasing their diverse applications in academic and personal contexts [1][3][5]. Group 1: AI Usage Among University Students - 70% of university students are using the Quark app, with high usage rates observed not only in major cities but also in key provinces [3][4]. - The top five popular AI applications among students include AI search, AI question answering, AI scanning, AI writing, and AI summarization [4]. - 28.8% of university students use Quark to generate campaign PPTs for class committee elections, with a total of 420,000 PPT requests related to student elections and club interviews in early September [4][7]. Group 2: Deep Interaction with AI - The penetration rate of AI among Quark's university users has reached 80%, indicating a shift towards more complex and professional inquiries [5]. - Medical students frequently use Quark for searching complex professional questions, with over 50% of them engaging with the app for academic purposes [5][6]. - The most searched topics in academic searches are in the fields of medicine, economics, and social issues [6]. Group 3: Diverse Applications Beyond Academics - University students are also using AI for personal matters, such as fortune-telling and dream interpretation [8]. - Examples include a veterinary student using AI to assess pig breeding practices and an enology student identifying grape varieties through image recognition [8]. Group 4: AI in College Entrance Examination Context - Quark's AI features for college entrance examination planning are particularly valued by students, with one freshman sharing her experience of using the app to generate her application strategy [10][11]. - Concerns were raised about the potential for uniformity in college applications due to AI assistance, but Quark's product manager clarified that the tool requires personalized input from users [12][13]. - Quark has shown adaptability by correcting previous misinformation regarding college programs, demonstrating a commitment to product optimization [14][16].
DeepDiver-V2来了,华为最新开源原生多智能体系统,“团战”深度研究效果惊人
量子位· 2025-09-11 10:19
Core Insights - The article discusses Huawei's latest release, DeepDiver-V2, a native multi-agent system designed for deep research, which utilizes a "teamwork" approach for task execution and information sharing [1][2]. Group 1: System Architecture and Functionality - DeepDiver-V2 employs a multi-agent system (MAS) architecture, featuring a Planner for task decomposition and multiple Executors for parallel processing of sub-tasks, enhancing efficiency [1][7]. - The system is capable of generating high-quality deep research reports, achieving an average report length of 24.6K tokens, significantly surpassing competitors like OpenAI's DeepResearch [4][2]. - The architecture allows for specialized roles among Executors, including Information Seekers for data collection and Writers for long-text generation, improving overall output quality [12][21]. Group 2: Performance Metrics - In benchmark tests, DeepDiver-V2-38B scored 34.6 in BrowseComp-zh, outperforming WebSailor-72B and other models, while DeepDiver-V2-7B also exceeded similar models [5][4]. - The system's performance is sensitive to the capabilities of Executors, indicating that their effectiveness is crucial for overall system performance [19][21]. Group 3: Training and Optimization - The training process involves multi-stage optimization, including supervised fine-tuning and rejection sampling techniques, which enhance the model's collaborative capabilities [15][16]. - The training data has been expanded to include more challenging and long-form writing tasks, contributing to the improved performance of DeepDiver-V2 [16][27]. Group 4: Future Implications - The transition from a single model to a multi-agent system represents a new paradigm in AI search, with potential applications in enterprise research, scientific literature reviews, and professional data analysis [27][28].
央企怎么做超级智能体?对谈中电信天翼AI:自研模型为底座,自主规划是必须,能适应千行百业才行
量子位· 2025-09-11 10:19
Core Viewpoint - The article discusses the launch of China Telecom's Tianyi AI "Star Super Intelligent Agent," which ranks first among state-owned enterprises in the DBC Deben Consulting 2025 Enterprise-level AI Agent list, highlighting its capabilities and market potential [1][4]. Group 1: Overview of Star Super Intelligent Agent - The Star Super Intelligent Agent is based on China Telecom's self-developed "Star Big Model" technology, designed for industrial intelligent upgrades [2][8]. - It supports multimodal understanding, including voice, vision, and text, and can generate images and videos from text, showcasing its rich capabilities [11][12]. - The agent emphasizes enhanced complex reasoning and memory capabilities, making it suitable for various real-world applications such as customer service and financial operations [13][14]. Group 2: Market Trends and Development - The article notes a surge in interest in intelligent agents, driven by government initiatives promoting AI integration across industries [4][43]. - There are ongoing discussions about the practical applications and effectiveness of intelligent agents in real-world scenarios, with a focus on their ability to automate complex tasks [5][6]. Group 3: Technical Insights - The Star Super Intelligent Agent framework is designed to be highly customizable, allowing businesses to integrate it into their existing systems effectively [16][17]. - It operates through a four-module architecture: perception and understanding, cognition and decision-making, memory and knowledge, and action and execution, enabling it to perform tasks similarly to humans [27][29]. Group 4: Industry Applications and Case Studies - Successful implementation examples include the development of an intelligent customer service system that automates complaint processing, demonstrating the agent's ability to integrate with existing business systems [36][54]. - The article emphasizes that sectors with high IT integration, such as customer service and marketing, are prime candidates for the rapid deployment of intelligent agents [52]. Group 5: Competitive Landscape - The market is characterized by various players, including large model manufacturers, tech giants, startups, and state-owned enterprises, each focusing on different aspects of intelligent agent development [53][54]. - China Telecom's unique advantage lies in its extensive local service teams and existing digital infrastructure, allowing for scalable and effective deployment of intelligent agents across industries [54][56].
攻克AI过度思考难题!美团新研究让通过“可验证”过程奖励激活LRM的高效推理
量子位· 2025-09-11 10:19
美团搜推Agentic System X (AsX) 团队 投稿 量子位 | 公众号 QbitAI LRM通过简单却有效的RLVR范式,培养了强大的CoT推理能力,但伴随而来的冗长的输出内容,不仅显著增加推理开销,还会影响服务的吞 吐量,这种消磨用户耐心的现象被称为"过度思考"问题。 针对这一缺陷,来自美团等机构的研究团队提出 可验证的过程奖励机制(VSRM) , 鼓励CoT中的"有效步骤",惩戒"无效步骤",最大限 度保持性能的同时,实现高效推理 。 通过在数学任务上的实验显示,在多个常用benchmark上, VSRM加持的后训练使得不同尺度的模型实现了输出长度的大幅缩减 ,甚至在部 分情况下提升了模型表现。 过度思考问题的本质 此前的工作将过度思考问题的现象总结为:对于一个问题,模型倾向于给出多种不同的解答,特别简单的问题。在这一认识的基础上,作者团 队更进一步,对现有LRM在MATH-500上做出的回复进行了深入的case study。 | Find the number of integer values of k in the closed interval [-500,500] for whic ...
国产类脑大模型适配国产沐曦GPU!长序列推理提速超百倍,仅用2%数据匹敌主流模型
量子位· 2025-09-11 10:19
Core Insights - The article discusses the development of SpikingBrain-1.0, a brain-inspired large model that aims to reduce the computational costs associated with long sequence reasoning [1][2]. Group 1: Model Architecture and Performance - SpikingBrain-1.0 leverages a brain-like information processing mechanism, achieving linear/near-linear complexity, which significantly enhances speed for long sequences. For instance, it shows a 26.5x speed improvement on a 1M length sequence compared to mainstream models [2][18]. - The model is designed to be compatible with domestic GPU clusters, indicating the feasibility of creating a new ecosystem for non-Transformer large models in China [2][28]. - The architecture includes SpikingBrain-7B and SpikingBrain-76B, which utilize a linear (mixed) model structure and a hybrid linear MoE model, respectively [10][14]. Group 2: Theoretical Foundations - The research team has established that complex endogenous dynamics in spiking neurons can mathematically equate to combinations of simpler spiking neurons, suggesting the potential for smaller networks to replace larger ones [5][6]. - A new approach based on "endogenous complexity" is proposed, aiming to integrate the rich dynamical characteristics of biological neurons into model development [7][8]. Group 3: Efficiency and Training - SpikingBrain-1.0 demonstrates significant training efficiency for long sequences, achieving comparable performance to many open-source Transformer models with only about 2% of the data [18]. - The model supports multi-card parallel inference and can handle up to 4M length sequences, with substantial acceleration in time-to-first-token (TTFT) compared to standard attention mechanisms [21][22]. Group 4: Future Directions - The team aims to further explore the relationship between endogenous dynamics of neurons and foundational AI operators, seeking to bridge neuroscience and artificial intelligence [28]. - The model is expected to provide significant efficiency advantages in scientific tasks involving long sequences, such as complex multi-agent simulations and molecular dynamics [28].
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-09-11 07:43
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行 的企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现 并致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人 ...
Kimi开源又放大招!20秒更新万亿参数的中间件来了
量子位· 2025-09-11 05:19
Core Viewpoint - The article discusses the introduction of a middleware called "checkpoint-engine" that enables the Kimi K2 model, which has one trillion parameters, to update its model weights in approximately 20 seconds across thousands of GPUs, marking a significant advancement in the efficiency of large language model training and inference [6][7]. Group 1: Middleware Functionality - The checkpoint-engine is designed to facilitate the updating of model weights during the inference process of large language models [6]. - It allows for both simultaneous broadcasting of updated weights to all nodes and point-to-point dynamic updates [2][24]. - The middleware supports a pipeline approach for parameter updates, minimizing memory usage by updating parameters one at a time [19][20]. Group 2: System Architecture - Kimi K2 employs a hybrid co-location architecture where the training and inference engines are deployed on the same set of nodes [8]. - During each reinforcement learning iteration, a centralized controller generates new training data using the inference engine and then instructs the training engine to update parameters based on this data [9]. - The system is optimized for high throughput, with each engine deeply optimized for performance [10]. Group 3: Parameter Update Process - The training engine's parameters are unloaded to DRAM, allowing for quick activation of the training engine with minimal data transfer [12]. - The checkpoint engine manages parameter states by first obtaining local parameter copies from the training engine and then broadcasting the complete parameter set to all checkpoint nodes [16][17]. - The inference engine retrieves only the necessary parameter slices from the checkpoint engine, streamlining the update process [18]. Group 4: Performance Optimization - The design sacrifices some data transfer efficiency for a simpler system architecture, which reduces the complexity of maintenance and testing [25][26]. - During the startup of the training engine, nodes selectively read parameters from disk to minimize expensive disk I/O operations [28]. - The checkpoint engine can independently restart in case of failures, enhancing system resilience [33].
81岁甲骨文创始人冲上首富!难怪马斯克念念不忘OpenAI
量子位· 2025-09-11 05:19
西风 发自 凹非寺 量子位 | 公众号 QbitAI 真是谁也没想到…… 就在昨天美股开盘后,传统数据库公司 甲 骨文 Oracle股价一度狂飙暴涨43% ,虽然盘中有所回落,但最后 收盘依然上涨近36% ,打破了 多项美股涨幅纪录。 一度还把甲骨文创始人,现年81岁的 拉里·埃里森 (Larry Ellison) 送上了全球首富宝座 。 埃里森一夜身价暴涨1000亿美元,总身家达到3930亿美元, (短暂) 超过3850亿美元的马斯克。 而且更有意思的是,推动甲骨文这波股价的疯狂暴涨的并非其传统优势的数据库业务突破。 功劳依然是当红趋势AI,功劳是马斯克爱过恨过还在打官司的 OpenAI 。 甲骨文 披露了一项与Op enAI之间的 3000亿美元算力采购 协议 。 这也是全球规模最大的云计算合同之一。 知情人士透露,该协议将于 2027年生效 ,OpenAI计划约五年内分批采购,年均支付额将高达600亿美元。 "星际之门"计划的一部分 实际上,甲骨文早在6月的一份文件中就暗示过这笔交易,称其已达成一份云服务协议,自2027年起每年将为其带来超过300亿美元的收入。 这份合同,对两家公司而言都是一场高风险豪 ...