Workflow
丢弃 - 插补困境
icon
Search documents
ICLR 2026 | 帝国理工大学提出DyMo:让多模态模型学会「选择」,突破模态缺失难题
机器之心· 2026-03-09 02:00
Core Insights - The article discusses the advancements in multimodal learning, particularly in addressing the challenge of missing modalities in various applications such as medical diagnosis and autonomous driving [3][39]. - A new framework called DyMo is introduced, which dynamically selects and integrates reliable recovery modalities during inference, overcoming the traditional dilemma of discarding or imputing missing data [15][39]. Multimodal Learning and Its Challenges - Multimodal learning is driving breakthroughs in fields like medical imaging and human-computer interaction by integrating various data types such as images, text, and tables [2]. - A significant issue in real-world applications is "missing modality," where certain data inputs are incomplete, leading to potential loss of critical information [3][7]. The Discarding-Imputation Dilemma - Existing methods for handling missing modalities fall into two categories: recovery-free methods that ignore missing data and recovery-based methods that attempt to reconstruct it, both of which have inherent drawbacks [11][12]. - The article highlights the "discarding-imputation dilemma," where discarding can lead to loss of important information, while imputation may introduce noise [3][12]. DyMo Framework - DyMo is designed to address the aforementioned dilemma by dynamically identifying and integrating reliable recovery modalities during the inference phase [15][39]. - The framework establishes a connection between information gain and task loss, utilizing a reward function to guide the modality selection process [19][21]. Experimental Results - DyMo has been tested on multiple datasets, including PolyMNIST, MST, and CelebA, demonstrating significant performance improvements in scenarios with missing modalities [4][30]. - For instance, in the PolyMNIST dataset, DyMo achieved a classification accuracy increase of 1.61% under missing modality conditions [12][31]. Conclusion and Future Directions - DyMo offers a novel perspective on multimodal learning, shifting the focus from merely recovering all modalities to determining which recovery modalities are trustworthy [39]. - Future research directions include extending dynamic selection to the training phase and exploring applications beyond classification tasks [41].