模型Souping
Search documents
Meta超级智能实验室又发论文,模型混一混,性能直接SOTA
机器之心· 2025-11-21 03:56
Core Insights - The article discusses the concept of Model Souping, which involves averaging the weights of multiple models of the same architecture to create a new, stronger model. This method is more lightweight and cost-effective compared to training a large unified model, while also leveraging the complementary capabilities of different models [1][2]. Group 1: Model Souping Methodology - Traditional Model Souping typically uses simple uniform averaging, which directly combines the parameters of candidate models with equal weights. The article introduces a systematic approach called Soup of Category Experts (SoCE), which selects optimal model candidates based on benchmark category composition and employs non-uniform weighted averaging to maximize overall performance [2][5]. - SoCE is based on the observation that model performance across different benchmark categories often shows weak correlation. This allows SoCE to strategically select expert models for each weakly correlated category cluster and combine them through optimized weighting [8][11]. Group 2: Experimental Results - The authors conducted extensive experiments to evaluate the effectiveness of SoCE across multiple dimensions. In the Berkeley Function Calling Leaderboard (BFCL), the 70 billion parameter model achieved an accuracy of 80.68%, setting a new state-of-the-art (SOTA) and improving by 2.7% over the previous best single model [14]. - For the 8 billion parameter model, SoCE reached an accuracy of 76.50%, surpassing the previous 8 billion model by 5.7%. The optimal weight configuration for the 8 billion model was identified as xLAM-2-8b-fc-r (0.7), ToolACE-2-8B (0.2), and watt-tool-8B (0.1) [16][18]. - The article presents a correlation heatmap illustrating the performance relationships among different categories, highlighting that strong correlations exist among multi-turn tasks, while weak or negative correlations are observed in other areas [6][8]. Group 3: Performance Improvement - The analysis indicates that the linear correlation between categories significantly improves after Model Souping. In 37 model souping experiments, the candidates showed higher scores in over 20 categories, with net performance gains across all categories [22][23]. - SoCE successfully identifies specialized models for different categories, leading to substantial performance enhancements [25].