Workflow
Pos2Distill
icon
Search documents
破解AI对不同上下⽂位置的敏感度不⼀致,新框架使出“解铃还须系铃人”
量子位· 2025-10-26 04:01
Core Insights - The article discusses the significant issue of positional bias in language models, which affects their performance in complex reasoning and long-text understanding tasks [1][8] - It introduces Pos2Distill, an innovative "position-to-position" distillation framework designed to transfer the model's strong capabilities from advantageous positions to disadvantaged ones, effectively mitigating positional bias [3][4] Summary by Sections Positional Bias Challenges - Language models exhibit inconsistent sensitivity to different contextual positions, leading to a focus on specific positions in input sequences, which hampers their performance in critical tasks [1] - When comparing two candidate answers, models often favor the first option, compromising their fairness and reliability as evaluators [2] Proposed Solution: Pos2Distill - Pos2Distill aims to leverage the model's acquired knowledge to correct its systematic biases by addressing the performance imbalance caused by positional bias [5] - The framework includes two specialized implementations: Pos2Distill-R1 for retrieval tasks and Pos2Distill-R2 for reasoning tasks, both showing improved consistency across all positions in long-text retrieval and reasoning tasks [5][29] Methodology - The article outlines the distinct behaviors of positional bias in retrieval and reasoning tasks, with retrieval bias manifesting as "token-shifting" and reasoning bias leading to "thought shifting" [10] - Pos2Distill-R1 employs Kullback-Leibler divergence loss to provide fine-grained correction signals for retrieval tasks, while Pos2Distill-R2 uses high-quality chain-of-thought responses from advantageous positions to guide reasoning trajectories [12][13] Experimental Results - Pos2Distill-R1 demonstrated robust and consistent performance, achieving an average accuracy of 56.7% across 20 positions in the WebQ dataset, comparable to the best performance at the optimal "sink position" [22][23] - Pos2Distill-R2 outperformed existing self-training methods, achieving a precise matching score of 42.8 on the MusiQue dataset and 58.3 on the HotpotQA dataset, indicating strong cross-domain generalization capabilities [27][28] Cross-Task Generalization - Both systems exhibit significant generalization capabilities across their respective tasks, with Pos2Distill-R1 enhancing contextual retrieval abilities and Pos2Distill-R2 improving contextual awareness for retrieval tasks [29][30]