视觉指令微调
Search documents
NeurIPS 2025 | 告别全量扫描!浙大提出COIDO:破解多模态数据选择「高耗」难题
机器之心· 2025-12-13 08:31
Core Insights - The article introduces COIDO (Coupled Importance-Diversity Optimization), a framework designed to optimize data selection for visual instruction tuning in multi-modal large language models (MLLMs) [4][9][23] - COIDO aims to reduce the computational costs associated with data selection while ensuring high-quality data is retained, addressing the challenges of existing methods that often require full data traversal [12][23] Group 1: Motivation and Background - The rapid growth of datasets, such as LLaVA-665K, has led to significant computational overhead and redundancy when fine-tuning MLLMs on full datasets [8] - Existing data selection methods face two main issues: high selection costs and the decoupling of importance and diversity in data selection [12][9] Group 2: Methodology - COIDO introduces a lightweight scoring mechanism that allows for training on a small sample (e.g., 20%) of the full dataset, enabling generalization without the need for full data traversal [14] - The core innovation of COIDO is the coupled optimization of importance and diversity within a unified training framework, rather than treating them as separate phases [14] - The importance loss is based on a reweighted cross-entropy loss, while the diversity loss utilizes spectral clustering to minimize variance among clusters, ensuring a diverse data selection [14][15] Group 3: Experimental Results - COIDO achieves state-of-the-art performance using only 20% of the data, reaching 98.2% of the performance of full data fine-tuning across various benchmarks [20][21] - The framework demonstrates strong generalization and transferability, outperforming models trained from scratch on new datasets [21] Group 4: Conclusion - COIDO presents a novel paradigm for multi-modal data selection, challenging the notion that data selection must be costly and providing a pathway for efficient fine-tuning of MLLMs [23][24] - The framework's low computational cost and high-quality data selection make it a valuable tool for researchers with limited resources [23]