Workflow
小样本学习
icon
Search documents
NeurIPS 2025 | 告别全量扫描!浙大提出COIDO:破解多模态数据选择「高耗」难题
机器之心· 2025-12-13 08:31
本文第一作者是二年级博士生闫熠辰,主要研究方向是多模态大模型的数据质量管理;通讯作者是李环研究员,主要研究方向包括人工智能数据准备、大模型高 效推理与部署、时空大数据与模型轻量化等。 03 研究背景与动机 (Motivation) 01 省流版:一张图看懂 COIDO 在深入技术细节之前,我们先用一张漫画来直观理解 COIDO (Coupled Importance-Diversity Optimization) 解决的核心问题与方案: 正如钟离在漫画中所言,面对海量视觉指令数据的选择任务,传统方法需要遍历全部数据才能进行筛选造成大量「 磨损」(高昂计算成本)。同时在面对数据重 要性和多样性问题时,传统方法往往顾此失彼。而 COIDO 通过「 耦合优化」的新契约,实现了以简驭繁的效果。 02 论文速览 多模态大语言模型(MLLM)的能力在很大程度上依赖于高质量的视觉指令微调(Visual Instruction Tuning)。然而,随着数据集规模的爆炸式增长(如 LLaVA- 665K),在全量数据上进行微调带来了巨大的计算开销和冗余 。 现有的数据筛选方法虽然旨在选取高质量子集,但普遍存在两个关键痛点: ...
具身智能机器人,如何才能活出个“人样”?
3 6 Ke· 2025-08-04 08:21
Core Insights - The article discusses the evolution and challenges of embodied intelligence, highlighting the distinction between "problem-solving" AI and "practical" AI, with the latter focusing on real-world interactions and learning through sensory experiences [1][3] - It emphasizes the need for embodied intelligence to overcome significant hurdles in understanding, associating, and interacting with the environment, which are essential for robots to function like humans in real-world scenarios [3][5] Group 1: Challenges in Embodied Intelligence - Embodied intelligence must adapt to unstructured real-world environments, requiring advanced computational capabilities to handle dynamic and unpredictable situations [5][6] - The development of higher cognitive strategies that integrate multiple sensory inputs is crucial for robots to understand and interact with their surroundings effectively [6][7] - Robots need to surpass traditional static data processing models to achieve a deeper understanding of dynamic changes and relationships in their environment [6][12] Group 2: Technological Components - The perception layer of embodied intelligence is vital for converting chaotic physical stimuli into understandable digital signals, relying on multimodal sensor fusion and dynamic environment modeling [8][10] - The cognitive layer processes raw data from the perception layer, employing hierarchical decision-making and world model construction to enable robots to learn from experiences [12][14] - The action layer ensures robots can execute tasks safely and effectively, utilizing bio-inspired drive technologies and human-robot collaboration safety designs [16][18] Group 3: Current Limitations and Future Directions - Current embodied intelligence models struggle with task completion rates in non-training scenarios, with a success rate of only 65% for tasks like object grasping [17] - Energy consumption and high costs remain significant barriers to the widespread adoption of humanoid robots, with typical models having a battery life of less than 2 hours and costs exceeding 500,000 yuan [18][19] - Research is focused on optimizing energy efficiency and reducing costs through new battery technologies and domestic production of core components [21][22] Group 4: Future Trends - The integration of multimodal large models is a key future direction, enabling robots to understand natural language commands and adapt quickly to new tasks with minimal samples [23][24] - Lightweight hardware innovations, such as bio-inspired muscle drive technologies, are expected to enhance performance while reducing costs [23][24] - The trend of virtual-physical collaborative evolution will allow robots to train in simulated environments, significantly improving their task execution capabilities in real-world settings [24][25]