Z Tech｜对话Meta FAIR研究科学家：利用Confidence动态过滤，告别低效推理

Core Viewpoint - The article discusses the emergence of the Deep Think with Confidence (DeepConf) method, which enhances the inference efficiency and performance of large language models (LLMs) by dynamically filtering low-quality inference trajectories using internal confidence signals during the inference process [1][5]. Group 1: DeepConf Methodology - DeepConf addresses the limitations of existing methods by utilizing model internal confidence signals to filter out low-quality inference trajectories, thereby improving both inference efficiency and performance [1][10]. - The method can be seamlessly integrated into existing service frameworks without requiring additional model training or hyperparameter tuning, making it user-friendly for developers [8][10]. - DeepConf operates in both offline and online modes, allowing for flexibility in application depending on the use case [8]. Group 2: Performance Metrics - In offline mode, DeepConf@512 achieved a 99.9% accuracy on the GPT-OSS-120B model, significantly surpassing the traditional majority vote accuracy of 97.0% [10]. - In online mode, DeepConf can reduce the number of generated tokens by up to 84.7% compared to full parallel inference while simultaneously improving accuracy, effectively balancing performance and efficiency [10]. Group 3: Contributors and Research Background - Jiawei Zhao, a research scientist at Meta FAIR and a Caltech PhD, focuses on optimization methods for LLMs and deep learning [5][6]. - Yichao Fu, a PhD student at UCSD, specializes in LLM inference optimization and has contributed to multiple research projects aimed at improving LLM scheduling and breaking sequential dependencies in inference [8][10].