Eagle3

Search documents
超大模型推理加速2.18倍!SGLang联合美团技术团队开源投机采样训练框架
量子位· 2025-07-26 09:01
Core Viewpoint - SpecForge is an open-source training framework designed for speculative sampling, specifically tailored for large models, achieving a 2.18x inference acceleration [1][15]. Group 1: SpecForge Overview - SpecForge is developed by the SGLang team in collaboration with Meituan's search recommendation platform and Cloudsway.AI [1]. - The framework is built to address the challenges posed by the increasing size of models, which often leads to lower inference efficiency [4][6]. - SpecForge integrates deeply with the SGLang inference engine, providing a seamless training and inference process for speculative sampling [5][7]. Group 2: Technical Features - The framework incorporates Eagle3, an advanced speculative sampling method that enhances inference speed by training a lightweight draft model to predict token distributions accurately [7]. - SpecForge supports various mainstream models, including complex MoE layers and Transformer variants, ensuring broad applicability [7]. - It features scalable distributed training through Fully Sharded Data Parallel (FSDP) and Tensor Parallelism (TP), optimizing resource utilization on GPU clusters [7][14]. Group 3: Training Modes and Efficiency - SpecForge offers two training modes: Online and Offline, allowing users to choose based on their specific needs and resource availability [10][17]. - The Training-Time Test (TTT) architecture enhances the robustness of the draft model, encapsulating complex processes to simplify implementation for users [9]. - The framework is designed with a focus on memory-efficient training, significantly reducing memory overhead even for trillion-parameter models [7]. Group 4: Experimental Validation - The effectiveness of SpecForge was validated through experiments on datasets like ShareGPT and UltraChat, demonstrating compatibility with the Eagle3 architecture [15]. - The draft models trained using SpecForge achieved a notable 2.18x inference acceleration on the MT-Bench benchmark [15]. Group 5: Future Developments - SpecForge's roadmap includes plans to support additional model architectures and integrate visual-language models (VLM) into the framework [22]. - The team aims to enhance training efficiency through improved parallel strategies and kernel optimizations [22].