全新合成框架SOTA：强化学习当引擎，任务合成当燃料，蚂蚁港大联合出品

Core Insights - The article discusses the launch of PromptCoT 2.0 by Ant Group and the University of Hong Kong, focusing on the direction of task synthesis in the second half of large models [1][5] - The team emphasizes the importance of task synthesis and reinforcement learning as foundational technologies for advancing large models and intelligent agents [6][7] Summary by Sections Introduction to PromptCoT 2.0 - PromptCoT 2.0 represents a comprehensive upgrade of the PromptCoT framework, which was initially introduced a year ago [4][16] - The framework aims to enhance the capabilities of large models by focusing on task synthesis, particularly in the context of complex real-world problems [5][9] Importance of Task Synthesis - Task synthesis is viewed as a critical area that includes problem synthesis, answer synthesis, environment synthesis, and evaluation synthesis [9] - The team believes that without a sufficient amount of high-quality task data, reinforcement learning cannot be effectively utilized [9] Framework and Methodology - The team has developed a general and powerful problem synthesis framework, breaking it down into concept extraction, logic generation, and problem generation model training [10][13] - PromptCoT 2.0 introduces an Expectation-Maximization (EM) cycle to optimize the reasoning chain iteratively, resulting in more challenging and diverse problem generation [15][23] Performance and Data Upgrades - PromptCoT 2.0 has shown significant improvements in performance, allowing strong reasoning models to achieve new state-of-the-art results [17] - The framework has generated 4.77 million synthetic problems, which exhibit higher difficulty and greater differentiation compared to existing datasets [19][20] Future Directions - The team plans to explore agentic environment synthesis, multi-modal task synthesis, and self-rewarding mechanisms to further enhance the capabilities of large models [27][28] - The integration of self-rewarding and game-theoretic approaches is seen as a potential avenue for improving model performance [29]