Code2Logic
Search documents
RL新思路,复旦用游戏增强VLM通用推理,性能匹敌几何数据
3 6 Ke· 2025-10-22 02:17
Core Insights - Fudan University's NLP lab developed Game-RL, which utilizes games to enrich visual elements and generate multimodal verifiable reasoning data, enhancing the reasoning capabilities of visual language models (VLM) [1][28] - The innovative Code2Logic method systematically synthesizes game task data, creating the GameQA dataset, which demonstrates the advantages of game data in complex reasoning training [1][28] Game-RL and Code2Logic - Game-RL constructs multimodal verifiable game tasks to reinforce VLM training, addressing the limitations of existing reinforcement learning (RL) approaches that focus primarily on geometric or chart reasoning [1][28] - The Code2Logic method leverages game code to systematically generate reasoning data, consisting of three core steps: game code construction, task and QA template design, and data engine construction [11][8] GameQA Dataset - The GameQA dataset comprises 4 cognitive ability categories, 30 games, 158 reasoning tasks, and 140,000 question-answer pairs, with tasks categorized into three difficulty levels [13][15] - GameQA's diverse game tasks provide a competitive edge in training models for general reasoning, matching the performance of traditional geometric datasets despite having fewer training samples [19][20] Training Outcomes - The use of GameQA in training led to improvements across four open-source VLMs on seven out-of-domain general visual language reasoning benchmarks, with Qwen2.5-VL-7B showing an average improvement of 2.33% [17][18] - GameQA's cognitive diversity and reasoning complexity demonstrate its generalizability and transferability, making it a valuable resource for enhancing VLM capabilities [20][19] Scaling Effects - Increasing the GameQA dataset size to 20,000 samples resulted in consistent performance improvements on general reasoning benchmarks [21][24] - Expanding the variety of games used in training enhances out-of-domain generalization effects, indicating the importance of diverse training data [22][24] Conclusion - The research introduces Game-RL and the Code2Logic method, expanding the reinforcement training domain for VLMs into gaming scenarios, and validates that Game-RL can enhance general reasoning capabilities [28][1]