Workflow
机器之心
icon
Search documents
被轻视的Rollout过程,是后训练的性能瓶颈,还是RL的ROI突破口?
机器之心· 2025-11-30 01:30
Group 1 - The Rollout process is a significant performance bottleneck in Reinforcement Learning (RL) post-training, consuming over 70% of the training time, and is crucial for improving training efficiency and effectiveness [1][5][6] - Research indicates that Rollout is a major energy consumer in RL post-training, with studies showing it occupies 70% of the time in RL training processes [6][8] - The quality of Rollout trajectories directly impacts the final results of RL training, with poor trajectories leading to local optima and high-quality trajectories enhancing model exploration and reasoning capabilities [8][9] Group 2 - The shift in focus within the LLM field from pre-training scale competition to enhancing post-training capabilities highlights the importance of optimizing the Rollout phase [6][7] - Rollout and Inference share core technological logic but differ in objectives and computational patterns, with Rollout aiming to provide diverse and valuable trajectory samples for training [7][8] - Recent efforts in the industry are exploring ways to improve computational efficiency and the quality of Rollout trajectories to achieve better RL post-training outcomes [9]
AI驱动的行情里,AI终于成了淘金的铲子
机器之心· 2025-11-29 09:33
Core Insights - The article discusses the intricate financial ecosystem surrounding AI companies, highlighting the capital circulation among major players like OpenAI, Oracle, and Nvidia [3][6]. - A significant contract between OpenAI and Oracle for $300 billion in computing resources is noted as one of the largest cloud service contracts in history, leading to substantial stock price increases for both companies [8][10]. - Nvidia's strategic partnership with OpenAI involves a $100 billion investment to build AI data centers, showcasing a cycle where investments return to the original investor through GPU orders [10][12]. - AMD's collaboration with OpenAI, involving a deal worth over $300 million in stock warrants, further illustrates the complex financial interactions within the AI sector [14]. Financial Dynamics - The article emphasizes the rapid increase in market valuations, with Nvidia's market cap surpassing $5 trillion and OpenAI reaching a valuation of $500 billion [14]. - The capital circulation among these companies creates a perception of wealth, but the real value lies in the technological advancements and productivity improvements driven by AI [22][24]. Challenges for Investors - Ordinary investors face significant challenges in understanding the complexities of the AI industry, including a lack of knowledge about AI technologies and tools [16][18]. - The article identifies three main challenges: cognitive gaps, lack of tools, and information lag, which create a barrier between professional investors and the general public [16][18]. - Despite these challenges, AI technology itself is seen as a potential tool to bridge the information gap, with new financial applications emerging to help investors navigate the AI landscape [18][19]. Emerging Roles - New types of financial brokers are emerging, acting as "information asymmetry eliminators" by providing analysis tools that make complex industry relationships understandable for retail investors [19][22]. - The article concludes that while capital circulation creates apparent wealth, the true value comes from AI's ability to enhance productivity and solve real-world problems [22][24].
炸了!ICLR 一键清零 rebuttal,全网研究者怒了
机器之心· 2025-11-29 09:33
Core Viewpoint - The ICLR (International Conference on Learning Representations) has implemented a significant reset of its review process, causing widespread disruption and frustration within the AI research community [2][11]. Group 1: ICLR Review Process Changes - ICLR announced that all Area Chairs (AC) will be reassigned, and all review comments and scores will revert to their initial state prior to discussions [2][11]. - The reset affects nearly 20,000 submissions and over 70,000 reviews, leading to a massive workload increase for the new ACs who must now reassess papers [11]. - The decision has been met with criticism, as many authors feel they are being unfairly punished for the actions of a few, leading to a sense of collective frustration and helplessness [6][11]. Group 2: Community Reactions - The AI community has reacted strongly, with discussions on platforms like Reddit highlighting the challenges faced by authors whose hard work has been undone [10]. - Some authors express that the reset may inadvertently benefit those who previously faced unresponsive reviewers, as it resets the playing field [11]. - There is a growing sentiment that the publication mechanism in machine learning conferences is flawed, with the recent events exposing existing issues in the review process [11].
NeurIPS 2025 | DynaAct:DeepSeek R1之外,探索大模型推理的另一条道路
机器之心· 2025-11-29 09:33
Core Insights - The article discusses the emergence of a new paradigm in large model reasoning, shifting from train-time scaling to test-time scaling (TTS), emphasizing the need for efficient inference rather than merely longer reasoning chains [3][10]. - The research team from Ant Group and the University of Hong Kong introduces DynaAct, a novel approach that focuses on dynamic action space optimization to enhance reasoning efficiency [4][7]. Group 1: DynaAct Overview - DynaAct is based on the principle of Action Space Optimization, which dynamically constructs a set of selectable actions at each reasoning step, allowing for more structured and efficient inference [7][11]. - The core idea of DynaAct is to transform the action space learning problem into a set selection problem, utilizing submodular optimization to achieve linear complexity algorithms [14]. Group 2: Methodology and Implementation - DynaAct employs a submodular function that includes utility and diversity components, measuring the similarity of the action space to the current state and the redundancy of actions within the action space [14]. - The implementation of DynaAct is supported by a high-performance Monte Carlo Tree Search (MCTS) framework, which significantly enhances the efficiency of node expansion, rollout, and reward calculation [19]. Group 3: Performance and Results - DynaAct outperforms traditional methods such as CoT, RAP, and rStar across six reasoning benchmarks, demonstrating the effectiveness of dynamic action spaces [21]. - Evaluation results indicate that DynaAct achieves a score of 70.22 on the MMLU benchmark, surpassing other models, and shows a stable test-time scaling trend with increased MCTS rollout iterations [22][25]. Group 4: Future Directions - The research team plans to explore the extension of dynamic action spaces to multi-agent planning scenarios and to combine submodular optimization with reinforcement learning for adaptive reasoning strategies [26].
2026 年,大模型未知的「能力拐点」能否实现可持续的业务增长?
机器之心· 2025-11-29 02:30
Core Insights - The article discusses the contrasting predictions of AI companies regarding their business growth by 2026, highlighting the uncertainty in whether AI can translate into tangible revenue growth [1]. Group 1: AI's Potential for Business Growth - Anthropic predicts that by mid-2026, AI models could autonomously work for a full 8-hour day, with at least one model expected to reach human expert levels in multiple industries by the end of 2026 [3]. - There is skepticism in the community regarding the success rates of AI models, with some arguing that a 50% success rate still necessitates human involvement for task completion and oversight [3][4]. - OpenAI's internal memo warns of a potential slowdown in growth, projecting revenue growth rates to drop to single digits (approximately 5-10%) by 2026, indicating a need for a "wartime" mentality among employees [4]. Group 2: Strategic Directions of Major AI Players - Anthropic's revenue model is heavily reliant on enterprise clients and APIs, which may allow it to surpass OpenAI in annual recurring revenue (ARR) without needing to replicate OpenAI's consumer-scale business [4]. - Google faces criticism regarding the performance of its Gemini product compared to ChatGPT, particularly in consumer-facing applications [5]. - Discussions around Meta's Llama 5 suggest potential changes in its release strategy, which could impact the open-source ecosystem in 2026 [5]. - Domestic players like Alibaba and ByteDance are also under scrutiny, with Alibaba potentially leveraging AI to integrate its various business units, while ByteDance's cloud services are gaining significant market share [6].
世界模型,是否正在逼近自己的「ChatGPT时刻」?
机器之心· 2025-11-29 01:49
Core Viewpoint - The article discusses the emerging focus on "world models" in the AI field, highlighting its potential applications and the ongoing debates among experts regarding its definition, construction, and commercialization [1][3]. Definition of World Models - Experts provided various definitions of world models, with key perspectives including: - A predictive model that forecasts the next state based on current conditions and action sequences, with applications in autonomous driving and embodied intelligence [4]. - A framework for AI to predict and assess environmental states, evolving from simple game worlds to complex virtual environments [4]. - An ambitious goal to create a 1:1 model of the world, acknowledging the impracticality of such precision but emphasizing purpose-driven modeling [4]. Construction of World Models - A central dilemma in developing world models is whether to prioritize model creation or data collection. Experts discussed: - The challenge of training models with limited data, particularly in autonomous driving, where most data is collected under ideal conditions [5]. - The importance of high-quality data for specific applications to enhance model performance [5]. - A proposed iterative approach where initial models generate data that can be used for further training [5]. Technical Implementation Paths - There are notable disagreements among experts regarding the technical paths for world models: - Some advocate for incorporating physical information into models, while others suggest a more pragmatic approach based on specific needs [7]. - The potential for models to evolve towards purely generative forms as capabilities improve [7]. Architectural Debate: Diffusion vs. Autoregressive - Experts shared their views on the suitability of diffusion versus autoregressive architectures for world models: - Diffusion models are seen as more aligned with the physical generation of content, reflecting how the brain decodes complex signals [8]. - There is a trend towards integrating different architectures to enhance model performance, recognizing the strengths of both diffusion and autoregressive methods [9]. Future of World Models - The timeline for achieving a "ChatGPT moment" for world models is uncertain, with estimates suggesting it may take around three years to realize significant breakthroughs [10]. - The current lack of high-quality long video data poses a significant challenge, with existing models primarily generating short clips [10]. - The commercialization of world models faces challenges in defining value for both business-to-business (B2B) and business-to-consumer (B2C) applications [10][11]. Conclusion - The roundtable discussion highlighted the vibrant and diverse nature of the world model field, emphasizing its potential for growth while acknowledging the challenges related to data, computational power, and technical direction [13].
NeurIPS 2025 Oral | 1个Token零成本,REG让Diffusion训练收敛快20倍!
机器之心· 2025-11-29 01:49
Core Insights - REG is a simple and effective method that significantly accelerates the training convergence of generative models by introducing a class token, enhancing the performance of diffusion models [2][9][38] Group 1: Methodology - REG combines low-level latent representations with high-level class tokens from pre-trained visual models, allowing for simultaneous noise addition and denoising optimization during training [9][14] - The training process requires only the addition of one token, resulting in a computational overhead of less than 0.5% while not increasing inference costs [9][10][26] - REG achieves a 63x and 23x acceleration in convergence speed compared to SiT-XL/2 and SiT-XL/2+REPA, respectively, on ImageNet 256×256 [10][17] Group 2: Performance Metrics - In terms of FID scores, REG outperforms REPA significantly, achieving a FID of 1.8 after 4 million training steps, while SiT-XL/2+REPA achieves a FID of 5.9 [17][19] - REG shows a reduction in training time by 97.90% compared to SiT-XL/2 while maintaining similar FID scores [24][25] - The inference overhead for REG is minimal, with increases in parameters, FLOPs, and latency being less than 0.5%, while FID scores improve by 56.46% compared to SiT-XL/2 + REPA [26][27] Group 3: Ablation Studies - Extensive ablation studies demonstrate the effectiveness of REG, showing that high-level global discriminative information significantly enhances generation quality [28][30] - The introduction of the DINOv2 class token leads to the best performance in generating quality images, indicating the importance of high-level semantic guidance [30][31] Group 4: Conclusion - Overall, REG represents a highly efficient training paradigm that integrates high-level and low-level token entanglement, promoting a "understanding-generating" decoupling in generative models, leading to superior generation outcomes without increasing inference costs [38]
AAAI 2026 | 电子科技大学提出OWL,基于双路径注意力干预的多模态大模型物体幻觉缓解
机器之心· 2025-11-28 08:05
Core Insights - The article discusses the increasing attention on mitigating object hallucination in visual language models (LVLMs) and introduces a novel framework called Owl, which employs a causal dual-path attention intervention to address this issue [2][4]. Group 1: Problem Identification - Existing methods primarily focus on either visual or textual attention independently, neglecting the critical imbalance in cross-modal attention interaction [5]. - There is a lack of quantitative measures for cross-modal dependencies during the decoding process, leading to a coarse intervention mechanism without theoretical guidance [5]. Group 2: Proposed Solution - The paper introduces a structural causal model that formalizes the decomposition of visual and textual attention into key mediating variables, highlighting how confounding factors distort attention and lead to hallucinations [4]. - A new metric, VTACR, is proposed to quantify the model's dependency on visual and textual modalities at each decoding layer, providing a measurable signal for fine-grained attention intervention [7]. Group 3: Methodology - The Owl framework employs a dual-path attention intervention method, creating a visual enhancement path and a textual enhancement path, using a contrastive decoding strategy to dynamically correct attention biases [8][10]. - During inference, the framework decomposes the attention weights of the language decoder into visual and textual components, adjusting attention based on the VTACR distribution to enhance the focus on image tokens while moderating the influence of textual history [10]. Group 4: Experimental Results - The Owl method was evaluated on three representative LVLMs: LLaVA-1.5, MiniGPT-4, and Shikra, against various baseline methods to ensure comprehensive assessment [12]. - In the CHAIR benchmark, Owl significantly reduced sentence-level hallucination by 17.6% and instance-level hallucination by 21.4% on LLaVA-1.5, while generating longer texts, indicating that it effectively mitigates hallucinations without sacrificing content richness [13]. - The method demonstrated comparable or improved performance on five visual question answering (VQA) tasks, with a 7.6% enhancement on the VizWiz task, suggesting that it may enhance the model's understanding of complex visual scenes [14]. - Manual evaluations using GPT-4V showed improvements in correctness by 20.1% and detailedness by 11.3% for LLaVA-1.5, indicating that the generated content is not only more faithful to the images but also richer in information [16]. Group 5: Visual Evidence - The paper presents typical hallucination cases where Owl effectively suppresses errors, ensuring generated results align closely with the actual image content [18]. - Visualizations reveal that Owl acts like a precise editor, suppressing "hallucination words" while prioritizing "correct words" during the generation process [18][19].
刚刚,神秘模型登顶视频生成榜,又是个中国模型?
机器之心· 2025-11-28 08:05
Core Viewpoint - The article discusses the emergence of a new AI video model named Whisper Thunder (aka David), which has surpassed existing models in the Artificial Analysis video leaderboard, indicating a significant advancement in AI video generation technology [1]. Group 1: Model Performance - Whisper Thunder ranks first on the Artificial Analysis leaderboard with an ELO score of 1,247, outperforming Veo 3 (1,226) and Kling 2.5 Turbo (1,225) [2]. - The model's performance is characterized by a fixed duration of 8 seconds for generated videos, with noticeable motion dynamics [3]. - Users have reported a decrease in the model's appearance frequency, suggesting that it may require multiple refreshes to encounter [3]. Group 2: Model Origin and Characteristics - There is speculation among users that Whisper Thunder may originate from China, based on its generation effects and aesthetic tendencies [4]. - The model has demonstrated impressive capabilities, although some users noted minor generation flaws, particularly during high-motion scenes [11][13]. Group 3: Example Prompts - Several prompts illustrate the model's versatility, including scenes of construction, emotional anime performances, and serene landscapes, showcasing its ability to create diverse and engaging visual narratives [5][6][7][8][9][10][12].
亚马逊研究奖获奖名单出炉:王晋东等26位华人入选
机器之心· 2025-11-28 04:11
Core Insights - The Amazon Research Awards (ARA) announced 63 recipients, including 26 Chinese scholars from 41 universities across 8 countries, aimed at funding multidisciplinary research topics [1][2]. AI Information Security - Eight researchers in AI information security received awards, with three being Chinese scholars [3]. - Zhou Li from the University of California, Irvine, focuses on using LLM for precise and analyst-friendly attack tracing in audit logs [4]. - Yu Meng from the University of Virginia studies weakly supervised RLHF, modeling ambiguity and uncertainty in human preferences [5]. - Ziming Zhao from Northeastern University specializes in system and software security, network security, and human-centered security research [6]. Amazon Ads - Two awardees in the Amazon Ads research area are both Chinese [8]. - Xiaojing Liao from the University of Illinois Urbana-Champaign investigates attack methods on large language models, focusing on interpretable vulnerability detection and remediation [10][11]. - Tianhao Wang from the University of Virginia works on differential privacy and machine learning privacy, designing practical algorithms [14]. AWS Agentic AI - Thirty researchers were awarded in the Agentic AI category, including several Chinese scholars [16]. - Cong Chen from Dartmouth College aims to drive global energy transition through engineering methods based on optimization, economics, and modern machine learning [19]. - Chunyang Chen from the Technical University of Munich focuses on the intersection of software engineering, human-computer interaction, and AI [21]. Trainium Development - Twenty awardees are involved in research related to Amazon's Trainium AI chips, with several being Chinese researchers [49]. - Kuan Fang from the University of Minnesota works on NetGenius for autonomous configuration and intelligent operation of next-generation wireless networks [50]. - Shizhong Han from the Lieber Institute focuses on revealing the genetic basis of brain diseases and translating genetic discoveries into new treatments [55]. Think Big Initiative - Three researchers were awarded under the Think Big initiative, which supports transformative ideas in scientific research, including one Chinese scholar [85]. - Tianlong Chen from the University of North Carolina at Chapel Hill utilizes molecular dynamics to empower protein AI models [88].