类GAN训练框架
Search documents
两个LLM互相对线,推理能力起飞:康奈尔团队发布大模型版类GAN训练法
机器之心· 2025-12-07 02:52
Core Insights - The article discusses the development of a new GAN-like training framework called PasoDoble, aimed at enhancing the reasoning capabilities of large language models (LLMs) through adversarial training without external supervision [3][41]. Group 1: PasoDoble Framework - PasoDoble consists of two models: Proposer, which generates challenging questions with standard answers, and Solver, which attempts to solve these questions [3][9]. - The training process involves Proposer generating question-answer pairs based on knowledge sampled from a knowledge base, while Solver generates multiple answers for each question [9][10]. - The framework does not rely on any supervisory signals throughout the training process, making it a fully unsupervised method [3][7]. Group 2: Performance Improvements - The implementation of PasoDoble has led to significant performance improvements in mathematical tasks, with Qwen3-1.7B-Base showing an average performance increase of approximately 13 percentage points and Qwen3-4B-Base showing an increase of about 16 percentage points [7][28]. - The results from various models indicate that the performance enhancement is more pronounced with larger model sizes, demonstrating the scalability of the PasoDoble approach [28][41]. Group 3: Reward Mechanism - The Proposer's reward mechanism is designed to encourage the generation of difficult and diverse questions, with rewards based on the difficulty and novelty of the questions generated [12][13]. - The Solver's training relies solely on correctness rewards, where each answer generated is compared to the standard answer provided by the Proposer [22][23]. - The effectiveness of the reward mechanisms is highlighted by the significant performance differences observed when using random rewards compared to the structured rewards from the PasoDoble framework [35][37]. Group 4: Experimental Results - The article presents detailed experimental results across various mathematical benchmarks, showing that PasoDoble significantly enhances model performance, particularly in competitive math tasks [28][29]. - The results indicate that models trained with PasoDoble consistently outperform baseline models, with notable improvements in accuracy across different benchmarks [28][34]. Group 5: Future Directions - Future research will explore extending the PasoDoble framework to other domains beyond mathematics, such as code generation and factual question answering, and investigate broader multi-model training paradigms [41].