Workflow
AngelSlim
icon
Search documents
腾讯AngelSlim升级,首个集LLM、VLM及语音多模态为一体的投机采样训练框架,推理速度飙升1.8倍
机器之心· 2026-01-16 01:55
Core Insights - The article discusses the challenges of high inference costs and delays in the large model application landscape, highlighting the need for cost reduction and efficiency improvements in the industry [2] - Speculative sampling is introduced as a novel inference acceleration paradigm that offers near-lossless speedup, gaining popularity in the industry [2] - Tencent's upgraded AngelSlim training framework leverages speculative sampling to enhance performance across various modalities, achieving significant inference speed improvements [2] Group 1: AngelSlim and Speculative Sampling - Speculative sampling utilizes a lightweight draft model to generate multiple candidate tokens, which are then verified by a larger model, effectively parallelizing the decoding process and reducing latency [4] - AngelSlim integrates various compression algorithms, including quantization and speculative sampling, to support multi-modal model training, achieving acceleration rates of 1.4 to 1.9 times [4][6] - The framework emphasizes deployment readiness, allowing models trained with AngelSlim to be seamlessly integrated into existing frameworks like vLLM and Sglang [7] Group 2: Key Features of AngelSlim - AngelSlim supports full-modal speculative sampling training, enabling shared core algorithms and engineering capabilities across different modalities [6] - The data processing module provides a stable and reusable data foundation for training across multiple modalities, including data resampling and preprocessing [12][13] - The model module features a unified TargetModel interface, allowing easy integration of new model architectures without modifying core algorithms [18] Group 3: Training Components and Performance - The training module is designed for both online and offline training modes, catering to different model sizes and memory constraints [20] - The training process includes training-time testing, allowing the model to learn from its own predictions during training [21] - AngelSlim's trained models have demonstrated acceleration performance in various tasks, achieving speedups of 1.4 to 1.9 times under specific conditions [25] Group 4: Future Plans - Future developments will focus on enhancing speculative sampling capabilities through tool and algorithm advancements, including offline hidden states generation and deeper integration of multi-modal features [30]