迁移能力指标（Transferability Index - filings, earnings calls, financial reports, news

迁移能力指标（Transferability Index

Search documents

量子位· 2025-07-07 06:13

Core Viewpoint - The article discusses the relationship between mathematical reasoning capabilities of large language models (LLMs) and their ability to transfer these skills to other tasks, highlighting that models trained with reinforcement learning (RL) show better transferability compared to those trained with supervised fine-tuning (SFT) [4][11]. Group 1: Mathematical Reasoning and Transferability - Research indicates that only models trained with RL can effectively transfer mathematical reasoning skills to other tasks, while SFT models show limited or no transfer [4][11]. - A Transferability Index (TI) is introduced to quantify the extent to which improvements in mathematical reasoning can be applied to other reasoning and non-reasoning tasks [8][9]. - If TI is greater than 0, it indicates a positive transfer effect to other tasks; if less than 0, it indicates negative transfer [9]. Group 2: Experimental Findings - The study evaluated over 20 models across various tasks, including mathematical reasoning, other reasoning tasks (like medical reasoning), and non-reasoning tasks (like common-sense dialogue) [7]. - Results show that models fine-tuned with RL consistently achieve higher transferability metrics across reasoning and non-reasoning tasks, while SFT models often experience negative transfer in non-reasoning tasks [11]. Group 3: Model Representation and Performance - PCA analysis reveals that RL fine-tuned models exhibit minimal shifts in representation space, indicating they retain previously learned knowledge while enhancing performance in specific domains [15]. - RL models demonstrate lower KL divergence in reasoning and non-reasoning tasks compared to SFT models, suggesting more stable and precise representation updates [16][18]. - The findings suggest that RL is crucial for achieving transferable reasoning capabilities in LLMs, marking another victory for reinforcement learning in this context [19].

强化学习（RL）

监督微调（SFT）

迁移能力指标（Transferability Index

TI）

Artificial Intelligence

大模型

强化学习（RL）

监督微调（SFT）

迁移能力指标（Transferability Index

TI）

Artificial Intelligence

大模型