量子位
Search documents
兼得快与好!训练新范式TiM,原生支持FSDP+Flash Attention
量子位· 2025-09-14 05:05
Core Viewpoint - The article discusses the introduction of the Transition Model (TiM) as a new paradigm in generative modeling, aiming to reconcile the trade-off between generation speed and quality by modeling state transitions between any two time points, rather than focusing solely on instantaneous velocity fields or fixed-span endpoint mappings [3][8][34]. Group 1: Background and Challenges - Traditional generative models face a fundamental conflict between generation quality and speed, primarily due to their training objectives [2][6]. - Existing diffusion models rely on local vector fields, which require small time steps for accurate sampling, leading to high computational costs [5][6]. - Few-step models, while faster, often encounter a "quality ceiling" due to their inability to capture intermediate dynamics, limiting their generation capabilities [5][7]. Group 2: Transition Model Overview - The Transition Model abandons traditional approaches by directly modeling the complete state transition between any two time points, allowing for flexible sampling steps [4][8]. - This model supports arbitrary step sizes and decomposes the generation process into multiple adjustable segments, enhancing both speed and fidelity [8][10]. Group 3: Mathematical Foundations - The Transition Model is based on a "State Transition Identity," which simplifies the differential equations governing state transitions, enabling the description of specific transitions over arbitrary time intervals [12][16]. - Unlike diffusion and mean flow models, which focus on instantaneous or average velocity fields, the Transition Model encompasses both, providing a more comprehensive framework for generative modeling [16][17]. Group 4: Experimental Validation - The Transition Model has been validated on the Geneval dataset, demonstrating that an 865M parameter version can outperform larger models (12B parameters) in terms of generation capabilities [20][34]. - The model's training stability and scalability have been enhanced through the introduction of a differential derivative equation (DDE) approach, which is more efficient and compatible with modern training optimizations [25][33]. Group 5: Conclusion - Overall, the Transition Model offers a more universal, scalable, and stable approach to generative modeling, addressing the inherent conflict between speed and quality in generative processes [35].
AI解数学题只靠最后一个token
量子位· 2025-09-14 05:05
Core Insights - The research indicates that in mental arithmetic tasks, the majority of calculations are concentrated on the last token, rather than being distributed across all tokens, suggesting that global information access is not necessary for specific tasks like mental arithmetic [1][11]. Group 1: Research Methodology - Researchers employed Context-Aware Mean Ablation (CAMA) and attention-based peeking techniques to conduct a series of ablation experiments on models like Llama-3-8B [2][22]. - The experiments aimed to identify the "minimum computation" required for models to perform well by systematically removing or altering parts of the model [3]. - A sparse subgraph termed "All-for-One" (AF1) was identified, which allows efficient computation with minimal layers and limited information transfer [4][5]. Group 2: Model Structure and Functionality - In the AF1 structure, initial layers (L_wait) do not perform calculations related to their own values but instead focus on general preparatory tasks [7]. - Information is transferred to the last token through intermediate layers (L_transfer), which then independently performs the final calculations [8][9]. - This separation of general computation and input-specific computation highlights the model's efficiency in handling arithmetic tasks [10]. Group 3: Experimental Findings - The experiments revealed that Llama-3-8B requires only the first 14 layers for general computation, followed by 2 layers for information transfer, with the remaining layers dedicated to the last token's self-computation [24][26]. - AF1_llama demonstrated high fidelity across eight tasks, maintaining performance levels close to the original model [28][29]. - The importance of specific attention heads in arithmetic calculations was confirmed, with the model retaining approximately 95% accuracy even after removing nearly 60 heads, indicating redundancy in attention heads [30]. Group 4: Generalization and Limitations - AF1_llama was tested for its ability to generalize to other arithmetic forms, showing high accuracy in direct arithmetic tasks but failing in tasks requiring semantic understanding, such as word problems and Python code [32][34]. - Similar AF1-like subgraphs were found in Pythia and GPT-J models, although these models exhibited shorter waiting periods and less clear performance boundaries compared to Llama [35][36]. Group 5: Contributions and Innovations - This research contributes to the understanding of arithmetic reasoning and cross-token computation mechanisms in large language models [37]. - The methodologies introduced, CAMA and ABP, offer innovative approaches that could extend beyond arithmetic tasks to broader applications [37].
他同时参与创办OpenAI/DeepMind,还写了哈利波特同人小说
量子位· 2025-09-13 08:06
Core Viewpoint - Eliezer Yudkowsky argues that there is a 99.5% chance that artificial intelligence could lead to human extinction, emphasizing the urgent need to halt the development of superintelligent AI to safeguard humanity's future [1][2][8]. Group 1: Yudkowsky's Background and Influence - Yudkowsky is a prominent figure in Silicon Valley, known for co-founding OpenAI and Google DeepMind, and has a polarizing reputation [5][10]. - He dropped out of school in the eighth grade and self-educated in computer science, becoming deeply interested in the concept of the "singularity," where AI surpasses human intelligence [12][13]. - His extreme views on AI risks have garnered attention from major tech leaders, including Musk and Altman, who have cited his ideas publicly [19][20]. Group 2: AI Safety Concerns - Yudkowsky identifies three main reasons why creating friendly AI is challenging: intelligence does not equate to benevolence, powerful goal-oriented AI may adopt harmful methods, and rapid advancements in AI capabilities could lead to uncontrollable superintelligence [14][15][16]. - He has established the MIRI research institute to study advanced AI risks and has been one of the earliest voices warning about AI dangers in Silicon Valley [18][19]. Group 3: Predictions and Warnings - Yudkowsky believes that many tech companies, including OpenAI, are not fully aware of the internal workings of their AI models, which could lead to a loss of human control over these systems [30][31]. - He asserts that the current stage of AI development warrants immediate alarm, suggesting that all companies pursuing superintelligent AI should be shut down, including OpenAI and Anthropic [32]. - Over time, he has shifted from predicting when superintelligent AI will emerge to emphasizing the inevitability of its consequences, likening it to predicting when an ice cube will melt in hot water [33][34][35].
攻克强化学习「最慢一环」!交大字节联手,让大模型RL训练速度飙升2.6倍
量子位· 2025-09-13 08:06
Core Insights - The article discusses the inefficiencies in reinforcement learning (RL) training, particularly highlighting the rollout phase, which consumes over 80% of the training time and is limited by memory bandwidth and autoregressive characteristics [1][2]. Group 1: RhymeRL Framework - Shanghai Jiao Tong University and ByteDance's research team introduced RhymeRL, which enhances RL training throughput by 2.6 times without sacrificing accuracy by leveraging historical data [2][21]. - RhymeRL is based on two key components: HistoSpec and HistoPipe [7]. Group 2: HistoSpec - HistoSpec innovatively incorporates speculative decoding, using previous historical responses as the "best script," which transforms the rollout process from a token-by-token generation to a batch verification process [9][10]. - This method significantly increases computational density and speeds up response generation by allowing high acceptance rates of drafts derived from historical sequences [13][14]. Group 3: HistoPipe - HistoPipe optimizes GPU resource utilization by implementing a scheduling strategy that minimizes idle time, allowing for efficient processing of tasks of varying lengths [15][19]. - It employs a "cross-step complement" approach to balance workloads across GPUs, ensuring that resources are fully utilized without idle periods [17][18]. Group 4: Performance Improvement - The combination of HistoSpec and HistoPipe results in a remarkable performance boost, achieving a 2.61 times increase in end-to-end training throughput for tasks such as mathematics and coding [21]. - This advancement allows researchers and companies to train more powerful models with fewer resources and in shorter timeframes, accelerating the iteration of AI technologies [22]. Group 5: Significance of RhymeRL - RhymeRL proposes a new paradigm in reinforcement learning by utilizing historical information to enhance training efficiency, demonstrating the potential for better resource allocation and compatibility with existing training algorithms [23].
AI水论文还得AI治:西湖大学首次模拟人类专家思考链,AI审稿分钟级给出全面反馈
量子位· 2025-09-13 06:07
Core Viewpoint - The article discusses the launch of AiraXiv, an open preprint platform for AI-generated academic papers, and DeepReview, an AI review system that simulates human expert evaluation, addressing the challenge of distinguishing high-quality research from a surge of AI-generated content [1][6][21]. AiraXiv Overview - AiraXiv is designed to manage and showcase AI-generated papers, reducing interference with traditional peer review processes [2][8]. - The platform provides a dedicated channel for high-quality AI research, allowing researchers to efficiently access valuable work [9]. - AiraXiv supports seamless integration with arXiv, enabling users to view original papers and AI review comments by entering the arXiv ID [10]. DeepReview Functionality - DeepReview is the first multi-stage AI review system that mimics human expert thought processes, aiming to provide systematic and interpretable paper evaluations [12]. - The review process includes three core stages: novelty verification, multi-dimensional assessment, and reliability validation [12][13][14]. - DeepReview can deliver comprehensive review feedback in minutes, significantly reducing the time required compared to traditional methods [19]. Performance Metrics - The DeepReviewer-14B model, trained on the DeepReview-13K dataset, outperforms the CycleReviewer-70B model while using fewer tokens [3]. - In optimal conditions, DeepReviewer-14B achieved a win rate of 88.21% and 80.20% against GPT-o1 and DeepSeek-R1, respectively [4]. Future Prospects - AiraXiv and DeepReview represent initial steps towards a broader exploration of AI's role in academic research, with plans to expand beyond computer science into other disciplines [21][22]. - The platforms aim to enhance the visibility and dissemination of quality research outcomes, reflecting potential changes in the research ecosystem where AI plays a larger role in various research stages [23]. Laboratory Background - The Westlake University Natural Language Processing Laboratory, established in September 2018 and led by Professor Zhang Yue, focuses on foundational and applied research in natural language processing and aims to advance the development of AI scientists [24].
谷歌DeepMind用AI探测引力波,登上Science了
量子位· 2025-09-13 06:07
Core Viewpoint - The collaboration between Google DeepMind, LIGO, and GSSI has led to the development of Deep Loop Shaping technology, significantly enhancing the low-frequency noise reduction capabilities in gravitational wave detection, allowing for more effective observation of cosmic events [1][4][14]. Summary by Sections Gravitational Waves and Detection Challenges - Gravitational waves are minute disturbances in spacetime caused by events like black hole and neutron star collisions, with signals weaker than atomic nuclei [6][7]. - The LIGO detector, spanning 2.5 miles (approximately 4 kilometers), is designed to capture these faint signals by measuring the interference of laser beams in two vacuum tubes [8][10]. - The detection of gravitational waves has been historically limited by noise interference, particularly in the 10-30Hz low-frequency range, which is crucial for observing medium-mass black hole mergers and neutron star collisions [13]. Breakthrough with AI Technology - The Deep Loop Shaping technology utilizes AI to manage noise rather than directly searching for gravitational waves, reconstructing LIGO's feedback control system [16][18]. - By simulating various noise factors and employing reinforcement learning, the AI optimized the detector's feedback loop, achieving a noise reduction in the 10-30Hz range to 1/30 of traditional methods, with some sub-bands reduced to 1/100 [18][20]. - This advancement has expanded LIGO's effective observation range from 130 million light-years to 170 million light-years, increasing the observable cosmic volume by 70% and significantly enhancing the number of detectable gravitational wave events annually [20][21]. Future Implications - The new technology allows for earlier warnings of cosmic collisions, enabling predictions of events such as neutron star mergers, potentially guiding observational efforts in real-time [22][23].
前谷歌X团队靠AI电影锁定戛纳!创立AI原生版皮克斯,公司预售已超1亿美元
量子位· 2025-09-13 06:07
全球首家 AI原生影视工作室 横空出世,项目收入竟已高达 1.1亿美元 ! 名叫 Utopai Stu dio s 。 允中 发自 凹非寺 量子位 | 公众号 QbitAI 当前AI热度如日中天,以AI切入电影行业的力量主要分为两派: 一派是以Runway、Pika为代表的 "工具派" ,聚焦AI的工具属性,核心发力点在于提升影视制作环节的效率。 另一派则是 "内容 +AI" 公司 ,其主要在内容的叙事创新与产业化层面推动AI的应用和发行,相当于是把手伸进了影视业最肥沃的利润区"内 容生产+产业落地"。 这两类公司的定位,决定了其不同的天花板。 前者更偏向于工具层的效率提升 ,其特点是,技术门槛高,能不断迭代生成模型的能力,但商业模式也往往会受限于工具类SaaS的逻辑 (即订阅费、API调用费、B端授权费) ,最后很可能会成为影视业的"基础设施型公司",或容易被后续更强大的通用模型所取代。 后者定位于创造新叙事形式和发行,这让其有机会直接切入到包括IP、版权、分发渠道 ,形成"内容+渠道+AI技术"三位一体的护城河。如若 能够成功突破,天花板将远远高于纯工具派,因为其有机会改变整个影视业的产业链模式,而不仅仅 ...
CNCC2025新闻发布会在京顺利召开
量子位· 2025-09-13 06:07
Core Points - The 2025 China Computer Conference (CNCC2025) will be held from October 22 to 25 in Harbin, Heilongjiang Province, with the theme "Digital Intelligence Empowerment, Infinite Possibilities" [1] - The conference aims to enhance the influence of the computer field in China and promote regional digital economic development [3][5] Event Overview - The conference will feature over 10,000 square meters of exhibition space, open to the public for free, marking a significant expansion in scale and depth [3] - A total of 19 invited reports, 3 main forums, and 154 specialized forums will be organized, focusing on various aspects of digital economy and AI [5][6] Key Participants - Notable speakers include academicians from various prestigious institutions and industry leaders, such as Sumi Helal from the University of Bologna and C. Mohan from Hong Kong Baptist University [5] - The forums will cover themes like "Digital Economy," "Large Model Development," and "Embodied Intelligence," with prominent experts leading discussions [5][6] Organizational Efforts - Harbin Engineering University and Harbin Institute of Technology are collaborating on the event's preparations, including venue setup and volunteer coordination [9][12] - The organizing committee emphasizes high standards in service and emergency management to ensure a successful conference [9][12] Media Engagement - The press conference attracted various media outlets, indicating strong interest and engagement from the media regarding the conference's significance [13][16]
小而美的生活秘书!美团Agent落地生活服务
量子位· 2025-09-13 04:02
Core Viewpoint - The article discusses the launch of Meituan's AI assistant, Xiaomei, which simplifies daily tasks such as ordering food and making restaurant reservations through natural language commands, eliminating the need for complex graphical interfaces [1][6][49]. Group 1: Functionality and User Experience - Xiaomei serves as a "small and beautiful" life secretary, efficiently handling daily needs and making life simpler [3][6]. - Users can interact with Xiaomei using voice commands, allowing for easy completion of tasks like ordering takeout and finding restaurants without navigating through multiple screens [7][9]. - The assistant can recommend food based on user preferences and past orders, acting as a "wish box" for meal suggestions [29][30]. Group 2: Technology and Data Integration - Xiaomei is powered by Meituan's LongCat model, which excels in natural language processing and can handle complex tasks due to its extensive training on real-world data [51][54]. - The integration of Xiaomei with Meituan's service system allows for seamless execution of tasks, ensuring that user requests are processed accurately and efficiently [58][60]. - The assistant is designed to learn from user interactions, adapting to individual habits and preferences over time, thus enhancing user experience [61][62]. Group 3: Comparison with Traditional Assistants - Unlike traditional AI assistants that require multiple clicks and operations, Xiaomei aims to create a more human-like interaction through natural dialogue [63][64]. - The assistant captures subtle changes in user habits and responds appropriately, fostering a sense of familiarity and understanding [65][66].
100轮工具调用,8B小模型也能做复杂长搜索!MiniMax&港科大最新开源
量子位· 2025-09-12 08:46
不圆 发自 凹非寺 量子位 | 公众号 QbitAI 网络搜索Agent效果不好,猛猛投喂一波数据,表现还那样,咋回事? 港科大&MiniMax团队指出问题核心:不是模型参数不够多,而是缺乏足够有挑战性的训练数据。 换句话说,别死记硬背了,来做点"真题"吧。 他们提出了一种构建高质量QA对的方法 WebExplorer 。 用该方法构建的数据集去训练,即使是较小的模型,也可以在复杂、长程的搜索任务上超越更大的模型。 训练后的8B模型支持高达 128K的上下文长度 和 100次工具调用轮次 的长期推理,能在参数量低于10B的模型中取得顶尖结果。 网友评价:用模型驱动的方式做探索,确实比传统图谱方法更能让智能体的浏览行为变灵活。 模型及数据集均已开源,链接可见文末。 优质训练数据稀缺 随着大语言模型(LLM)的快速发展,智能体的能力边界不断扩展。 网络搜索智能体作为这一发展的重要组成部分,能够自主地从广泛的在线资源中检索信息;长视野(Long-Horizon)网络智能体更是需要在 多个网站间进行复杂的推理和搜索。 可是呢, 现有的开源网络智能体在处理复杂搜索任务时往往表现有限,更强大的商业模型又缺乏透明的训练细节 ...