机器之心
Search documents
微软BitDistill将LLM压缩到1.58比特:10倍内存节省、2.65倍CPU推理加速
机器之心· 2025-10-20 07:48
Core Insights - The article discusses the challenges of deploying large language models (LLMs) efficiently in downstream applications, particularly on resource-constrained devices like smartphones, due to high memory and computational costs [1][7] - A new approach called BitDistill is introduced, which aims to compress existing pre-trained LLMs into a 1.58-bit BitNet model while minimizing performance loss and training costs [4][19] Group 1: Challenges and Solutions - LLMs face significant deployment challenges as their scale increases, leading to instability in training and performance degradation when quantized to lower bit representations [2][10] - The introduction of extreme low-bit LLMs, such as BitNet, aims to reduce memory usage and accelerate inference, but achieving comparable accuracy to high-precision models requires extensive pre-training [1][4] Group 2: BitDistill Framework - BitDistill consists of three key stages: model refinement, continuous pre-training, and distillation-based fine-tuning [8][12] - The first stage addresses activation variance issues in low-bit models by introducing additional normalization layers to stabilize the optimization process [9][30] - The second stage involves continuous training with a small amount of pre-training data to adapt the model to the 1.58-bit representation before fine-tuning on specific tasks [11][32] - The third stage employs knowledge distillation techniques to align the performance of the quantized model with that of the full-precision teacher model [13][27] Group 3: Experimental Results - BitDistill demonstrates excellent scalability, achieving performance comparable to full-precision baselines while providing significant improvements in inference speed (approximately 2x) and memory usage (nearly 10x reduction) [19][20] - Experiments on text classification and summarization tasks show that the 1.58-bit BitDistill model maintains high accuracy and quality, with results indicating a strong performance across various model sizes [16][21] - The method exhibits cross-architecture generality, maintaining stable performance even when using different pre-trained models [22] Group 4: Ablation Studies - Ablation studies indicate that each stage of the BitDistill process is crucial for achieving the desired balance between efficiency and accuracy, with the removal of any stage leading to significant performance drops [25][26] - The combination of logits and attention distillation techniques yields the best results, highlighting the importance of using multiple strategies to mitigate quantization challenges [27][29]
小红书RecSys 2025最佳论文提名背后:破解视频时长预测难题
机器之心· 2025-10-20 04:50
Core Insights - The article highlights the impressive capabilities of Xiaohongshu's recommendation system, which has gained recognition at the RecSys 2025 conference for its innovative research and technology [4][6][7]. Group 1: Xiaohongshu's Recognition - Xiaohongshu's recommendation algorithm team received a "Best Paper Candidate" nomination at the prestigious RecSys 2025 conference for their paper on video watch time prediction [4][6]. - The conference is recognized as a leading academic event in the field of recommendation systems, attracting top scholars and industry experts from around the world [6][7]. - Xiaohongshu's technology and product have become focal points at the conference, with many attendees praising its recommendation capabilities as industry-leading [9][10]. Group 2: Research and Methodology - The paper titled "Multi-Granularity Distribution Modeling for Video Watch Time Prediction via Exponential-Gaussian Mixture Network" addresses the critical issue of predicting user watch time, which is essential for enhancing user engagement [17][22]. - The research identifies complex user behavior patterns in watch time, highlighting the challenges of skewed distributions and diverse viewing habits [30][31]. - The proposed Exponential-Gaussian Mixture Network (EGMN) model combines classic probability distributions to predict the complete probability distribution of watch time rather than a single value [33][35]. Group 3: Performance and Validation - EGMN demonstrated superior performance in offline experiments, achieving a 14.11% reduction in Mean Absolute Error (MAE) and a 7.76% increase in ranking consistency [39]. - Online A/B testing covering 15 million users over seven days showed significant improvements in key metrics, with a 19.94% decrease in KL divergence, indicating strong distribution fitting capabilities [40][41]. - Ablation studies confirmed the effectiveness of EGMN's components, validating the contributions of both the exponential and Gaussian components to the model's performance [42]. Group 4: Future Directions - The article emphasizes Xiaohongshu's commitment to a pragmatic approach in technology development, focusing on real user problems and continuous exploration of cutting-edge recommendation algorithms [46][47]. - The success at RecSys 2025 is seen as a starting point for further advancements in the recommendation system field, with the team actively seeking talent to enhance their research efforts [47].
轻量高效,即插即用:Video-RAG为长视频理解带来新范式
机器之心· 2025-10-20 04:50
Core Insights - The article discusses the challenges faced by existing visual language models (LVLMs) in understanding long, complex video content, highlighting issues such as context length limitations, cross-modal alignment difficulties, and high computational costs [2][5] - A new framework called Video-RAG has been proposed by researchers from Xiamen University, Rochester University, and Nanjing University, which offers a lightweight and efficient solution for long video understanding tasks without requiring model fine-tuning [2][21] Challenges - Current mainstream methods are categorized into two types, both of which struggle with visual-semantic alignment over long time spans, often sacrificing efficiency for accuracy, making them impractical and less scalable [5][6] - The existing approaches, such as LongVA and VideoAgent, rely on large-scale data for fine-tuning and incur high costs due to frequent calls to commercial APIs [6] Innovations - Video-RAG introduces a novel approach that leverages "retrieval" to bridge the gap between visual and language understanding, utilizing a Retrieval-Augmented Generation (RAG) method that does not depend on model fine-tuning or expensive commercial models [9][21] - The core idea involves extracting text clues that are strongly aligned with visual content from videos, which are then retrieved and injected into the existing LVLM input stream for enhanced semantic guidance [9] Process Overview 1. **Query Decoupling**: User queries are automatically decomposed into multiple retrieval requests, allowing the system to search for relevant information from different modal databases while significantly reducing initial computational load [10] 2. **Multi-modal Text Construction and Retrieval**: Three semantic alignment databases are constructed using open-source tools, ensuring that the retrieved texts are synchronized with the visuals and carry clear semantic labels [11] 3. **Information Fusion and Response Generation**: The retrieved text segments, original queries, and a few key video frames are input into existing LVLMs for final inference output, all without requiring model fine-tuning, thus lowering deployment barriers and computational costs [12] Technical Components - **OCR Text Library**: Utilizes EasyOCR for frame text extraction, combined with Contriever encoding and FAISS vector indexing for rapid retrieval [13] - **Speech Transcription Library (ASR)**: Employs the Whisper model for audio content extraction and embedding [13] - **Object Semantic Library (DET)**: Uses the APE model to detect objects and their spatial relationships in key frames, generating structured descriptive text [13] Performance and Advantages - Video-RAG allows LVLMs to focus more on relevant visual information post-retrieval, effectively reducing modality gaps, and is characterized as lightweight, efficient, and high-performing [15] - The framework is plug-and-play, compatible with any open-source LVLM without requiring modifications to model architecture or retraining [16] - In benchmark tests, Video-RAG outperformed commercial closed-source models like GPT-4o and Gemini 1.5 when combined with a 72B parameter open-source LVLM, demonstrating remarkable competitiveness [18] Outcomes and Significance - The success of Video-RAG validates a significant direction in enhancing cross-modal understanding capabilities by introducing high-quality, visually aligned auxiliary text, thus overcoming context window limitations [21] - This framework addresses issues of "hallucination" and "attention dispersion" in long video understanding and establishes a low-cost, highly scalable technical paradigm applicable in various real-world scenarios such as education, security, and medical imaging analysis [21]
Codeforces难题不够刷?谢赛宁等造了个AI出题机,能生成原创编程题
机器之心· 2025-10-20 04:50
Core Insights - The article discusses the importance of training large language models (LLMs) to generate high-quality programming problems, which is crucial for advancing their capabilities towards artificial general intelligence (AGI) [1][3]. Group 1: Problem Creation and Evaluation - Creating programming competition problems requires a deeper understanding of algorithms compared to merely solving them, as competition problems have strict standards to evaluate underlying algorithm design principles [2]. - The ability to generate better problems will lead to more rigorous benchmarks for competitive programming, as existing datasets often suffer from high false positive and false negative rates [2][21]. - The AutoCode framework, developed by the LiveCodeBench Pro team, automates the entire lifecycle of creating and evaluating competitive programming problems using LLMs [3][7]. Group 2: Framework Components - The AutoCode framework consists of a Validator, Generator, and Checker, ensuring that inputs adhere to problem constraints and minimizing false negatives [8][10]. - The Generator employs diverse strategies to create a wide range of inputs, aiming to reduce false positive rates, while the Checker compares outputs against reference solutions [12][14]. - A dual verification protocol is introduced to ensure correctness without human intervention, significantly improving the quality of generated problems [29]. Group 3: Performance Metrics - The AutoCode framework achieved a consistency rate of 91.1% with a false positive rate of 3.7% and a false negative rate of 14.1%, marking a significant improvement over previous methods [21][22]. - In a more challenging benchmark with 720 recent Codeforces problems, AutoCode maintained a consistency of 98.7%, validating its effectiveness on modern, difficult problems [24]. - The framework's performance was further validated through ablation studies, confirming the effectiveness of its components [26]. Group 4: Novel Problem Generation - The team established a new problem generation framework that builds on robust test case generation, introducing a dual verification protocol to ensure correctness [29]. - LLMs can generate solvable problems that they themselves cannot solve, indicating a strength in knowledge recombination rather than original innovation [34]. - The quality of generated problems is assessed based on difficulty and the increase in difficulty compared to seed problems, providing reliable indicators of problem quality [34][38]. Group 5: Conclusion - The AutoCode framework represents a significant advancement in using LLMs as problem setters for competitive programming, achieving state-of-the-art reliability in test case generation and producing new, competition-quality problems [36]. - Despite the model's strengths in algorithmic knowledge recombination, it struggles to introduce truly novel reasoning paradigms or flawless example designs [37].
SIGGRAPH Asia 2025 | OmniPart框架,让3D内容创作像拼搭积木一样简单
机器之心· 2025-10-20 04:50
Core Viewpoint - The article introduces OmniPart, a novel framework for part-aware 3D generation that addresses the challenge of creating, editing, and combining 3D object components, enhancing the quality and efficiency of 3D content creation [2][23]. Summary by Sections Introduction - Researchers from Hong Kong University, VAST, Harbin Institute of Technology, and Zhejiang University have developed OmniPart, which has been accepted for presentation at SIGGRAPH Asia 2025 [2]. Methodology - OmniPart employs a two-stage "planning-generation" strategy, decoupling complex generation tasks into controllable structure planning and spatially-conditioned part synthesis [8][10]. First Stage: Structure Planning - The first stage involves planning the 3D object's component layout using a self-regressive Transformer model that predicts bounding boxes based on 2D images. Users can control the decomposition granularity through flexible 2D part masks [10][11]. Second Stage: Part Generation - The second stage generates high-quality 3D parts based on the spatial blueprint created in the first stage. It utilizes a pre-trained 3D generator (TRELLIS) for efficient fine-tuning, ensuring high consistency among parts [12][13]. Experimental Results - OmniPart demonstrates superior generation quality compared to existing methods like Part123 and PartGen, excelling in geometric detail, semantic accuracy, and structural consistency [14][16]. - The efficiency of OmniPart is significantly improved, completing the end-to-end generation process in approximately 0.75 minutes, compared to 15 minutes for Part123 and 5 minutes for PartGen [16]. Applications - OmniPart supports various downstream applications, including mask-controlled generation, multi-granularity generation, material editing, and geometry processing, enhancing the editing and customization capabilities of 3D content [18][20][21]. Conclusion - The OmniPart framework sets a new benchmark in quality and efficiency for part-level 3D content generation, paving the way for advancements in game development, animation, and virtual reality [23].
无需再训练即可增强性能!港大团队提出GPC框架,实现机器人「策略组合」
机器之心· 2025-10-19 09:17
Core Viewpoint - The article introduces the General Policy Composition (GPC) framework, which provides a novel, training-free solution to enhance the performance of robot control strategies by dynamically combining multiple pre-trained models during test time, thus overcoming the limitations of traditional training methods [2][5][7]. Summary by Sections Improving Policy Performance - GPC presents a paradigm shift in enhancing policy performance without relying on additional training, instead utilizing a method of combining existing strategies [6][15]. Innovative Theoretical Foundation - The framework is built on two key theoretical findings: 1. Functional-Level Improvement, which shows that convex combinations of decision scores from multiple pre-trained strategies can yield a more accurate combined score than any single strategy [9]. 2. System-Level Stability, which ensures that improvements in single-step errors propagate throughout the entire trajectory, leading to overall performance enhancement [10]. General "Policy Composer" - GPC's core advantage lies in its plug-and-play nature, allowing for the seamless integration of various robot strategies without the need for retraining [14][15]. Heterogeneous Strategy Flexibility - GPC can flexibly combine strategies across different architectures and modalities, effectively balancing information from various conditions to produce stable and coherent action trajectories [17][19]. Weight Search for Optimal Strategy - The weight search mechanism in GPC customizes optimal weight configurations for different tasks, emphasizing the importance of weight distribution in maximizing the effectiveness of the combined strategy [22][23]. Experimental Validation - GPC has demonstrated superior performance in both simulation and real-world environments, achieving significant success rate improvements over single baseline methods, with up to 7.55% in simulation tasks and 5-10% in real-world tasks [28][30]. Key Findings from Experiments - Three core findings from experiments highlight GPC's versatility: 1. GPC can achieve higher accuracy when combining strategies with moderate accuracy levels [29]. 2. The presence of a weak strategy can hinder overall performance, indicating the need for careful selection of contributing strategies [29]. 3. Performance is maximized when stronger strategies are given greater weight in the combination [29].
长上下文窗口、Agent崛起,RAG已死?
机器之心· 2025-10-19 09:17
Core Viewpoint - The article discusses the evolving landscape of Retrieval-Augmented Generation (RAG) and its potential obsolescence due to advancements in context engineering and agent capabilities, suggesting that RAG is not dead but rather transforming into a more sophisticated retrieval paradigm [2][5][21]. Group 1: RAG's Evolution and Current Status - RAG has become a standard solution for addressing the limitations of LLM input lengths, acting as an external knowledge base since 2022 [3][4]. - The emergence of long context windows and agent capabilities is challenging RAG's traditional role, leading to debates about its relevance [5][6]. - RAG is evolving into "agentic retrieval," where AI agents play a central role in advanced retrieval systems, moving beyond basic block retrieval [8][21]. Group 2: Stages of RAG Development - The first stage of RAG involves basic "Top-k" retrieval, where documents are split into chunks, and the most relevant chunks are retrieved based on user queries [10][11]. - The second stage introduces lightweight agents for automatic routing, allowing the system to intelligently select the appropriate retrieval method based on user queries [15]. - The third stage expands to composite retrieval APIs, enabling the system to handle multiple document formats efficiently [17][19]. Group 3: RAG's Future and Integration with Agents - The ultimate goal is to create a fully agent-driven knowledge system that can make intelligent decisions at every stage of the retrieval process [18][21]. - RAG is being redefined as a powerful component within an agent toolbox, rather than the default architecture for all applications [54]. - The future landscape will likely see a combination of various technologies tailored to specific application scenarios, emphasizing the importance of understanding the strengths and weaknesses of each paradigm [52][54].
Meta用40万个GPU小时做了一个实验,只为弄清强化学习Scaling Law
机器之心· 2025-10-19 09:17
Core Insights - The article discusses the advancements in Reinforcement Learning (RL) scaling, emphasizing the need for a systematic approach to understand how to effectively scale RL algorithms and their computational requirements [2][3][4]. Group 1: Research Background - Recent progress in RL has largely stemmed from isolated studies on specific algorithms or models, lacking a comprehensive scaling theory that limits broader research participation [3]. - The study aims to establish a scientific foundation for RL scaling by borrowing concepts from the well-developed "Scaling Law" in pre-training [3][4]. Group 2: Proposed Framework - A predictive framework is introduced to characterize the relationship between RL performance and computational power, using a sigmoid-like saturation curve to link expected rewards with training compute [5][7]. - The framework allows researchers to extrapolate performance at larger scales based on smaller experiments, facilitating the evaluation of RL methods' scalability without exhausting computational budgets [7]. Group 3: ScaleRL Development - ScaleRL is designed based on a systematic empirical study covering over 400,000 GPU hours, exploring various design choices on an 8B parameter model [8]. - Three key principles were identified: performance ceilings vary by method, methods that perform well at small scales may underperform at larger scales, and many techniques thought to enhance peak performance primarily affect computational efficiency [10][11]. Group 4: Algorithmic Choices - ScaleRL integrates existing methods rather than introducing new algorithms, combining asynchronous Pipeline-RL structures, length interruption mechanisms, and various loss functions to achieve predictable scaling [11][36]. - The study validates the effectiveness of these design choices through leave-one-out experiments, demonstrating that ScaleRL consistently outperforms existing RL configurations in both performance and efficiency [38]. Group 5: Predictive Performance Insights - The research investigates which scaling dimensions—context length, batch size, generation count per prompt, or model size—yield the most reliable performance improvements under fixed or growing computational budgets [39]. - Results indicate that larger batch sizes stabilize performance ceilings and avoid premature stagnation, while increasing generation lengths can enhance performance ceilings [42][47]. Group 6: Conclusion and Recommendations - The findings establish a rigorous, quantifiable methodology for predicting the scalability of new RL algorithms, making it a significant contribution to the field of RL in large language models [11][50].
OpenAI「解决」10道数学难题?哈萨比斯直呼「尴尬」,LeCun辛辣点评
机器之心· 2025-10-19 03:48
Core Viewpoint - The article discusses the controversy surrounding OpenAI's claims about GPT-5's capabilities in solving mathematical problems, which were later revealed to be exaggerated and based on existing literature rather than original solutions [1][14][17]. Group 1: Events Leading to Controversy - OpenAI researcher Sebastien Bubeck tweeted that GPT-5 had "solved" Erdős Problem 339, which was incorrectly listed as unsolved in the official database [4][5]. - Following this, other OpenAI researchers claimed to have discovered solutions to 10 problems and made progress on 11 others, leading to widespread media excitement about GPT-5's mathematical reasoning abilities [8][14]. - The initial excitement was quickly countered by criticism from Google DeepMind's CEO Demis Hassabis, who pointed out the misinterpretation of the results [16][17]. Group 2: Clarifications and Apologies - Thomas Bloom, the maintainer of the problem database, clarified that the problems were marked as unsolved due to a lack of awareness of existing solutions, not because they were unsolved [17]. - Bubeck later deleted his tweet and apologized for any misunderstanding, emphasizing the value of AI in literature search rather than in solving complex mathematical problems [18][19]. Group 3: Broader Implications and Perspectives - The incident highlights the tension between the need for scientific rigor and the pressure for hype in the AI community, especially regarding funding and public perception [38][39]. - Terence Tao suggested that AI's most productive applications in mathematics may lie in accelerating mundane tasks like literature reviews rather than solving the most challenging problems [33][36].
一个运行了80年的算法,我们现在才真正理解它?
机器之心· 2025-10-19 03:48
Core Insights - The article discusses the significance of the simplex method, a mathematical optimization technique developed by George Dantzig in 1947, which has been widely used for resource allocation and logistics decisions for nearly 80 years [4][6][10]. Group 1: Historical Context - George Dantzig, a prominent mathematician, created the simplex method after solving two unsolved problems during his graduate studies, which later became foundational for his doctoral thesis [2][3]. - The U.S. military's interest in optimization problems post-World War II led to the development of the simplex method to efficiently allocate limited resources in complex scenarios [5][6]. Group 2: Theoretical Developments - Despite its practical efficiency, the simplex method faced theoretical challenges, particularly regarding its potential exponential time complexity with increasing constraints, as proven by mathematicians in 1972 [7][10]. - Recent research by Sophie Huiberts and Eleon Bach has addressed these theoretical concerns, demonstrating that the feared exponential running time does not manifest in practice [10][26]. Group 3: Methodological Insights - The simplex method operates geometrically by finding the shortest path through a multi-dimensional space defined by constraints, aiming to maximize profit or minimize costs [11][19][21]. - The introduction of randomness in the algorithm, as established by earlier researchers, has been shown to significantly improve its performance, ensuring polynomial time complexity rather than exponential [13][17][26]. Group 4: Future Directions - The latest findings suggest that while significant progress has been made in understanding the simplex method, the ultimate goal remains to develop a method that scales linearly with the number of constraints [28]. - Although the research has not yet led to direct practical applications, it provides stronger mathematical support for the reliability of current software based on the simplex method, alleviating concerns about potential exponential complexity [30].