架构解耦
Search documents
架构解耦是统一多模态模型所必须的吗?全新AIA损失:No
机器之心· 2025-12-02 05:07
Core Insights - The rapid development of unified understanding and generation models has faced challenges due to conflicts between visual understanding and generation tasks [2] - Researchers from CUHK MMLab and Meituan believe that the performance of unified models will eventually reach that of single-task models, but they question whether the current approach of decoupling architectures is truly beneficial [2][3] Unified Model Intent - The original intent of unified models is to enhance single-task performance through a transparent and rational process of interleaved text and image reasoning [3] - Examples include generating corresponding images while navigating mazes or drawing auxiliary lines during mathematical problem-solving [3] Architecture Decoupling Issues - Models like BAGEL require complex processes to achieve interleaved reasoning, leading to significant computational overhead and potential information loss [3] - Despite current performance gains, researchers warn that these issues may become more pronounced as research progresses [3] AIA Introduction - To explore the reasons behind performance improvements from architecture decoupling and to find ways to enhance model performance without it, CUHK MMLab and Meituan introduced AIA [5] Research Findings - Researchers found that regardless of how models are decoupled, understanding and generation tasks exhibit a negative correlation at the same network layer [8] - This indicates that decoupling does not fundamentally resolve the conflicts between tasks [8] AIA Loss Design - AIA loss was designed to explicitly constrain the interaction patterns of unified models during training, using the cross-modal interaction patterns of single-task models as a learning target [10] AIA Effectiveness - Experiments on Emu3 and Janus-Pro showed that AIA can enhance model performance without additional tricks, reducing the performance gap with more decoupled models [12] AIA Training Sensitivity - AIA loss demonstrated stable convergence across a wide range of weight adjustments during training, particularly for Emu3, which had weaker pre-training knowledge [17] - In contrast, Janus-Pro's strong pre-training knowledge made it more sensitive to AIA loss adjustments [17] AIA Advantages - The introduction of AIA loss can mitigate common data ratio issues, achieving better results with a 1:1 data ratio for generation and understanding tasks, indicating a collaborative optimization effect [19] Unified Model Training Path - The dynamic allocation of task weights during unified training may represent the correct behavior of unified models, suggesting that task conflicts could be a natural characteristic rather than a problem to avoid [21] - Another approach involves removing task differentiation cues to force the model to learn a truly unified space, though this increases training difficulty [22] Future Outlook - AIA represents an initial step in analyzing the principles of unified model training, with a call for more researchers to explore this field [24] - The theoretical and architectural aspects of unified models are still immature, necessitating collaborative exploration [24]