双层优化

Search documents
登顶多模态推理榜MMMU,UCSD新方法超越GPT-5、Gemini
3 6 Ke· 2025-09-19 06:58
Core Insights - DreamPRM, developed by a research team from the University of California, San Diego, has achieved the top ranking on the MMMU (Massive Multi-discipline Multimodal Understanding) leaderboard, showcasing significant advancements in reasoning capabilities of large language models (LLMs) [1][18] - The introduction of the Process Reward Model (PRM) allows for supervision at intermediate steps in reasoning, enhancing the model's ability to select appropriate problem-solving paths [1] - DreamPRM-1.5 refines the weighting mechanism from domain-level to instance-level, enabling the model to leverage the potential value of each training sample [4][5] Model Architecture and Training Framework - DreamPRM-1.5 employs a dual-layer optimization framework, which dynamically adjusts sample weights based on reasoning performance, ensuring that the learning process is responsive to the effectiveness of the model [11][19] - Two complementary architectures, Instance Table and Instance Net, are designed for sample-level weighting: - Instance Table assigns independent weight parameters to each training sample, suitable for smaller datasets but challenging with larger ones due to parameter count [10] - Instance Net uses a small MLP network to predict weights, maintaining a fixed parameter count and better suited for large-scale training [10] Performance and Results - In experiments on the MMMU benchmark, DreamPRM-1.5 demonstrated superior accuracy, achieving 84.6% with the Instance Table and 83.6% with the Instance Net, significantly outperforming baseline models [15][16] - The model surpassed other top-performing models, including GPT-5 (84.2%) and Gemini 2.5 Pro Deep-Think (84.0%), indicating its effectiveness in multimodal reasoning tasks [18][20] Conclusion and Future Directions - The introduction of instance-level reweighting in multimodal reasoning training highlights the importance of data quality and its nuanced utilization in future model research [19][20] - Enhanced sample weighting and process scoring methods are anticipated to be key drivers in advancing multimodal reasoning capabilities [19]