Workflow
内存优化
icon
Search documents
Z Tech|对话Meta FAIR研究科学家:利用Confidence动态过滤,告别低效推理
Z Potentials· 2025-09-05 02:27
Core Viewpoint - The article discusses the emergence of the Deep Think with Confidence (DeepConf) method, which enhances the inference efficiency and performance of large language models (LLMs) by dynamically filtering low-quality inference trajectories using internal confidence signals during the inference process [1][5]. Group 1: DeepConf Methodology - DeepConf addresses the limitations of existing methods by utilizing model internal confidence signals to filter out low-quality inference trajectories, thereby improving both inference efficiency and performance [1][10]. - The method can be seamlessly integrated into existing service frameworks without requiring additional model training or hyperparameter tuning, making it user-friendly for developers [8][10]. - DeepConf operates in both offline and online modes, allowing for flexibility in application depending on the use case [8]. Group 2: Performance Metrics - In offline mode, DeepConf@512 achieved a 99.9% accuracy on the GPT-OSS-120B model, significantly surpassing the traditional majority vote accuracy of 97.0% [10]. - In online mode, DeepConf can reduce the number of generated tokens by up to 84.7% compared to full parallel inference while simultaneously improving accuracy, effectively balancing performance and efficiency [10]. Group 3: Contributors and Research Background - Jiawei Zhao, a research scientist at Meta FAIR and a Caltech PhD, focuses on optimization methods for LLMs and deep learning [5][6]. - Yichao Fu, a PhD student at UCSD, specializes in LLM inference optimization and has contributed to multiple research projects aimed at improving LLM scheduling and breaking sequential dependencies in inference [8][10].
小米取得内存优化方法相关专利
Jin Rong Jie· 2025-08-26 05:28
Group 1 - The core point of the article is that Beijing Xiaomi Mobile Software Co., Ltd. has obtained a patent for a "memory optimization method, device, and computer storage medium" with the authorization announcement number CN113722080B, applied on May 2020 [1] - Beijing Xiaomi Mobile Software Co., Ltd. was established in 2012 and is primarily engaged in software and information technology services, with a registered capital of 148.8 million RMB [1] - The company has made investments in 4 enterprises, participated in 139 bidding projects, and holds 5000 patent information along with 123 administrative licenses [1]
内存压缩技术新突破,提高AI推理效率!
半导体芯闻· 2025-04-25 10:19
如果您希望可以时常见面,欢迎标星收藏哦~ 来源:内容编译自 eetimes ,谢谢。 ZeroPoint Technologies 和 Rebellions 旨在开发一种 AI 加速器,以降低 AI 推理的成本和功耗。 据称,ZeroPoint Technologies 的内存优化技术能够快速压缩数据、增加数据中心的内存容量并提 高每瓦的 AI 推理性能。 2025年4月,瑞典内存优化知识产权(IP)供应商ZeroPoint Technologies(以下简称ZeroPoint) 宣布与Rebellions建立战略合作伙伴关系,共同开发用于AI推理的下一代内存优化AI加速器。该 公司计划在 2026 年发布一款新产品,并声称"有望实现前所未有的代币/秒/瓦特性能水平"。 作为合作的一部分,两家公司将使用 ZeroPoint 的内存压缩、压缩和内存管理技术来增加基本模 型推理工作流程的内存带宽和容量。 ZeroPoint 首席执行官 Klas Moreau 声称其基于硬件的内存 优化引擎比现有的软件压缩方法快 1,000 倍。 ZeroPoint 的内存压缩 IP 价值主张 首先,压缩和解压缩。其次,压缩生成的 ...