Core Insights - The article discusses a new technology called DroPE, developed by a research team led by Llion Jones, one of the core authors of the Transformer architecture, to address the challenges of long text processing in large models [1][24]. - DroPE allows for seamless zero-shot context expansion without the need for expensive long-context training, requiring less than 1% of the pre-training budget for model recalibration [2]. Group 1: Technology Overview - DroPE can be seen as a method to discard positional embeddings to extend context [5]. - The technology utilizes RoPE (Rotary Positional Encoding) as a temporary training tool during the pre-training phase to ensure stability and efficiency in training [12][13]. - During the inference phase, DroPE discards positional embeddings and performs brief recalibration under the original context length, unlocking the model's long-context extrapolation capabilities [15][16]. Group 2: Performance Metrics - Experiments conducted on various models, including a 5M parameter model and the SmolLM family (360M/1.7B) as well as the 7B parameter Llama2-7B, showed significant improvements [17]. - In the LongBench benchmark test, DroPE improved the average score of the base SmolLM by over 10 times [18]. - In the NIAH task evaluation, the recall rate of the DroPE model reached 74.92%, significantly surpassing traditional RoPE scaling methods [19]. Group 3: Comparative Analysis - A comparative table shows that DroPE outperforms other methods in various tasks, achieving an average score of 30.52 in the LongBench benchmark [20]. - Even on the large-scale Llama2-7B model, DroPE demonstrated exceptional performance in long-context question answering and summarization tasks using only 0.5% of the pre-training budget for recalibration [20]. Group 4: Company Background - The team behind DroPE, Sakana AI, was co-founded by Llion Jones and former Google senior scientist David Ha [24]. - Sakana AI has gained attention for creating the first AI scientist capable of generating complete academic papers, which has positioned the company prominently in the AI landscape [26].
把RoPE扔掉,AI更能看懂长上下文!Transformer作者团队开源大模型预训练新方法
量子位·2026-01-13 09:50