免费知识蒸馏 - filings, earnings calls, financial reports, news

免费知识蒸馏

Search documents

量子位· 2025-12-17 09:07

Core Viewpoint - The article discusses a new method called ViLoMem, developed by Nanjing University of Science and Technology in collaboration with Baidu, which addresses the issue of large models having poor memory retention, enabling them to learn from past mistakes by separating visual and logical errors into distinct memory streams [1][5]. Group 1: ViLoMem Framework - ViLoMem employs a dual-stream semantic memory system that allows models to remember visual and logical errors separately, enhancing their ability to learn from experiences [15][16]. - The framework consists of two main components: memory generation and memory retrieval, which work together to improve the model's performance without altering its parameters [18][5]. Group 2: Memory Generation - When a model fails on a task, ViLoMem activates two branches: a visual analysis module to identify visual errors and a logical analysis module to pinpoint logical mistakes, generating structured guidelines for both types of errors [19][20][21]. - Newly generated memories are matched for similarity with existing memories to either merge them into more abstract rules or create new memory slots, preventing memory overload while allowing for the abstraction of general semantic patterns [22][24]. Group 3: Memory Retrieval - The retrieval strategies for visual and logical memories differ, with visual memory using a two-stage retrieval process that includes image-level similarity search and question semantic filtering [27][28]. - Logical memory retrieval focuses on understanding the problem first before searching for relevant rules, which is more effective than simple keyword matching [29]. Group 4: Performance Improvement - ViLoMem has shown significant performance improvements across six multimodal reasoning benchmarks, with notable gains in mathematical tasks, such as a +6.48 increase for GPT-4.1 on MathVision [2][31]. - Smaller models benefit even more from ViLoMem, with Qwen3-VL-8B achieving a +4.38 increase on MMMU [31]. Group 5: Cross-Model Memory Transfer - An interesting experiment demonstrated that smaller models could achieve better scores by utilizing memories generated by larger models, indicating a form of "free knowledge distillation" [34][36]. - This suggests that experiences from stronger models can directly enhance the performance of weaker models without the need for fine-tuning [36].

多模态推理

免费知识蒸馏

Artificial Intelligence

ViLoMem

多模态推理

免费知识蒸馏

Artificial Intelligence

ViLoMem