Workflow
置信度过滤
icon
Search documents
比GPT-5还准?AIME25飙到99.9%刷屏,开源模型首次
3 6 Ke· 2025-08-25 03:50
Core Insights - DeepConf, developed by Meta AI and UC San Diego, enables large models to monitor confidence levels in real-time during inference, dynamically eliminating low-confidence paths and weighting high-confidence paths for improved accuracy and efficiency [1][8][9] - At the AIME 2025 competition, DeepConf achieved a remarkable 99.9% accuracy using open-source models without external tools, while reducing token generation by 85% [2][4][19] Performance Metrics - DeepConf demonstrated an average accuracy improvement of approximately 10% across various models and datasets [10][19] - The method significantly reduced token generation, achieving a reduction of up to 85% while maintaining high accuracy [10][21] Methodology - The core idea of DeepConf is to filter inference paths based on confidence signals, balancing quality answers with efficiency [8][9] - DeepConf operates in two modes: offline and online. In offline mode, it evaluates completed inference paths, while in online mode, it monitors confidence in real-time to terminate low-quality paths [14][31] Voting Mechanism - DeepConf employs a confidence-weighted majority voting system, where the contribution of each inference path to the final decision is weighted by its confidence level [29][30] - The method filters out the lowest confidence paths before voting, ensuring that only high-confidence paths contribute to the final answer [15][30] Implementation and Compatibility - DeepConf is compatible with existing models without requiring additional training or hyperparameter tuning, allowing for easy deployment with minimal code [10][21] - The system can be integrated into vLLM with approximately 50 lines of code, making it accessible for various applications [10] Research and Development - The research team, led by Yichao Fu at UC San Diego, focuses on optimizing algorithms and systems for large language models (LLMs) to enhance their reasoning processes [47]