Workflow
PASTA
icon
Search documents
策略学习助力LLM推理效率:MIT与谷歌团队提出异步并行生成新范式
机器之心· 2025-05-21 04:00
Core Insights - The article discusses the innovative research on asynchronous generation in large language models (LLMs) conducted by MIT and Google, highlighting the transition from traditional sequential generation to a more efficient parallel generation approach [5][25]. Group 1: Asynchronous Generation Paradigm - The emerging asynchronous generation paradigm allows for the parallel processing of semantically independent content blocks, achieving a geometric speedup of 1.21 to 1.93 times compared to traditional methods, with quality variations ranging from +2.2% to -7.1% [4][21]. - The research introduces a new markup language, PASTA-LANG, designed specifically for asynchronous generation, which includes three core tags: <promise/>, <async>, and <sync/> [8][10]. Group 2: PASTA System and Training - The PASTA system employs a two-stage training process: the first stage involves supervised fine-tuning using a dataset with PASTA-LANG tags, while the second stage focuses on preference optimization through policy learning [16][18]. - The PASTA model demonstrates significant improvements in both speed and output quality, showcasing its ability to adaptively determine the best asynchronous generation strategies based on content characteristics [6][21]. Group 3: Performance and Scalability - Experimental results indicate that PASTA achieves a balance between performance and quality, with the ability to provide substantial speed improvements even when prioritizing quality [23]. - The scalability of the PASTA method is evident, as ongoing preference optimization continues to enhance model performance, indicating a sustainable path for efficiency improvements [23][24].