Workflow
多智能体强化学习
icon
Search documents
天南海北新年味|刷新“亲吻数”纪录的“新年礼物” 揭秘PackingStar背后的科学浪漫
Xin Hua Cai Jing· 2026-02-15 07:41
Core Insights - The research team from Shanghai Institute of Science and Intelligent Technology, in collaboration with Peking University and Fudan University, has developed a multi-agent reinforcement learning system called PackingStar, which has set new records in the long-standing mathematical problem known as the "kissing number" problem, marking a significant breakthrough in the field of mathematical structures [1][2][3] Group 1: Research and Development - PackingStar addresses high-dimensional combinatorial optimization problems, similar to challenges in new material design and drug discovery, by finding optimal solutions in exponentially growing search spaces [3] - The system has revealed solutions that possess clear geometric rules while breaking global symmetry, leading to new mathematical constructs that were previously incomprehensible [3] - The collaboration between human intuition and AI in the research process has transformed the role of mathematicians from tedious calculations to becoming "mathematical observers" and "intuition designers" [3][4] Group 2: AI and Human Collaboration - The project signifies a shift towards a new paradigm of collaborative research where human mathematicians provide insights and intuition, while AI constructs structures and searches for proofs, creating a feedback loop that enhances both AI capabilities and human mathematical intuition [4][5] - The development of PackingStar is compared to AlphaFold in biology, highlighting the need for deep collaboration between AI experts and mathematicians to tackle problems that lack existing training data [4][6] Group 3: Cultural and Philosophical Context - The team embodies a cross-disciplinary approach, merging backgrounds in physics, AI, and mathematics, which fosters a creative environment conducive to scientific breakthroughs [7][8] - The name "PackingStar" reflects both the research focus on high-dimensional space and the diverse talents of the team members, symbolizing a new generation of scientific inquiry at the intersection of technology and humanities [7][8]
情人节最硬核“Kiss”!中国AI突破300年亲吻数难题,连刷多维度纪录
量子位· 2026-02-14 08:13
Core Viewpoint - The article discusses the breakthrough in solving the Kissing Number Problem using AI, specifically through a system called PackingStar, which has achieved significant advancements in high-dimensional geometry [1][10][49]. Group 1: Kissing Number Problem Overview - The Kissing Number Problem investigates how many equal-sized spheres can touch another sphere without overlapping in n-dimensional space [2][4]. - The problem has historical significance, originating from a debate between Newton and Gregory in 1694 regarding the arrangement of spheres in three-dimensional space [5][6]. - Recent advancements have been limited, with only seven substantial progressions in nearly 50 years [9]. Group 2: Breakthrough Achievements - The PackingStar system, developed by a collaborative team from Shanghai Science and Technology Institute, Peking University, and Fudan University, has set new records for dimensions 25 to 31 [10][11]. - The system has also discovered over 6,000 new configurations in various dimensions and broken long-standing records in generalized kissing numbers [10][11]. Group 3: Methodology and AI Integration - PackingStar transforms the high-dimensional geometric problem into a multi-agent game, allowing AI to explore potential structures autonomously [18][24]. - The approach involves using a cosine matrix to represent the positions of spheres, which is well-suited for parallel computation on GPUs [18][24]. - The system employs a collaborative mechanism between two agents to fill, prune, and reconstruct geometric structures, significantly reducing the complexity of high-dimensional exploration [25][31]. Group 4: Implications for Mathematics and AI - The discoveries made by PackingStar challenge traditional human intuitions about symmetry in geometric structures, revealing many non-symmetric configurations that yield better results [27][28]. - The project exemplifies a shift in AI's role from merely assisting in calculations to actively participating in scientific exploration, marking a new phase in AI for Science [64][65]. - The results have implications across various mathematical fields, connecting concepts from sphere packing, number theory, and group theory, thus enhancing the overall mathematical discourse [34][60]. Group 5: Infrastructure and Future Directions - The project highlights the importance of robust AI infrastructure, which is crucial for tackling complex mathematical problems that require extensive computational resources [39][40]. - The development of custom CUDA operators and an automatic checkpointing system has improved the efficiency and stability of long-duration tasks [42][46]. - The success of PackingStar indicates a promising future for AI in mathematics, suggesting that previously unsolvable problems may become accessible through innovative AI methodologies [49][60].
SPIRAL:零和游戏自对弈成为语言模型推理训练的「免费午餐」
机器之心· 2025-07-30 05:13
Core Insights - The research introduces SPIRAL, a framework that utilizes self-play in zero-sum games to enhance reasoning capabilities in language models without relying on human supervision [3][33]. - The study demonstrates that competitive self-play can lead to significant improvements in reasoning skills, as evidenced by a 8.7% increase in mathematical reasoning ability and an 18.1 percentage point improvement on the Minerva Math benchmark [7][30]. Group 1: Research Background - The collaborative research involves institutions such as the National University of Singapore and A*STAR, focusing on scalable autonomous agents capable of intelligent decision-making in unknown environments [1]. - The success of models like OpenAI's o1 and DeepSeek-R1 highlights the potential of reinforcement learning to enhance reasoning capabilities in language models [2]. Group 2: SPIRAL Framework - SPIRAL employs self-play in zero-sum games to autonomously discover and reinforce generalizable reasoning patterns, eliminating the need for manually designed reward functions and expert supervision [3][6]. - The framework utilizes a distributed online multi-agent reinforcement learning system for fine-tuning large language models across various two-player zero-sum games [24]. Group 3: Game-Based Training - The research identifies three games with distinct cognitive demands—TicTacToe, Kuhn Poker, and Simple Negotiation—as effective training environments for enhancing reasoning skills [12][11]. - The self-play mechanism allows for adaptive difficulty adjustments, ensuring continuous evolution of the model's capabilities [11]. Group 4: Transfer of Skills - The study reveals that reasoning patterns developed in games can transfer to mathematical problem-solving, with specific skills like expected value calculation and case analysis showing significant migration rates [18][19]. - The multi-game training approach leads to synergistic effects, enhancing performance in unfamiliar games compared to single-game specialists [21]. Group 5: Technical Innovations - The introduction of Role-Aware Advantage Estimation (RAE) prevents "thinking collapse," ensuring stable gradient updates and consistent reasoning generation throughout training [26][28]. - The SPIRAL framework has shown effectiveness even in strong models, with notable performance improvements in established benchmarks [30]. Group 6: Practical Implications - SPIRAL offers a novel approach for researchers and engineers aiming to enhance model reasoning capabilities without the need for extensive high-quality reasoning data [35]. - The findings suggest that pre-trained models already contain various reasoning patterns, and reinforcement learning can help identify and strengthen those that are truly generalizable [35]. Group 7: Limitations and Future Directions - Despite its successes, SPIRAL faces limitations such as the need for carefully designed game environments and high computational resource demands [38]. - Future research may explore hybrid game types and meta-game learning to cultivate more comprehensive reasoning abilities [37].
Meta-Think ≠ 记套路,多智能体强化学习解锁大模型元思考泛化
机器之心· 2025-07-03 03:26
Core Viewpoint - The article discusses a new framework called ReMA (Reinforced Meta-thinking Agents) designed to enhance the reasoning capabilities of large language models (LLMs) by introducing a multi-agent system that separates meta-thinking from reasoning tasks, thereby improving adaptability and effectiveness in complex problem-solving [3][4][6][10]. Group 1: Introduction and Background - Recent explorations in large model reasoning have introduced various paradigms, including structured search and process reward models, but the mechanisms behind "Aha Moments" in reasoning remain unclear [3]. - The study emphasizes the importance of reasoning patterns and posits that the strength of complex reasoning in large models fundamentally relies on their meta-thinking abilities [3][4]. Group 2: ReMA Framework - The ReMA framework consists of two hierarchical agents: the meta-thinking agent, which generates strategic supervision and planning, and the reasoning agent, which executes detailed sub-tasks based on the meta-thinking agent's guidance [10][11]. - This multi-agent system allows for a more structured and efficient exploration of the reasoning process, balancing generalization capabilities and exploration efficiency [12]. Group 3: Methodology - The study defines a single-round multi-agent meta-thinking reasoning process (MAMRP) where the meta-thinking agent analyzes the problem and generates a solution plan, while the reasoning agent completes the task based on these instructions [13][14]. - In multi-round interactions, the meta-thinking agent can provide ongoing guidance, allowing for planning, reflection, and correction throughout the reasoning process [14][20]. Group 4: Experimental Results - In single-round experiments, ReMA consistently outperformed baseline methods across various benchmarks, demonstrating superior generalization capabilities, particularly on out-of-distribution datasets [27][28]. - The results indicate that ReMA's meta-thinking mechanism significantly enhances performance, with improvements noted in specific benchmarks such as AMC23, where performance increased by up to 20% [28][29]. Group 5: Challenges and Future Work - The study acknowledges challenges in multi-round training, including instability and sensitivity to hyperparameters, suggesting that the current training processes may not be suitable for stochastic or non-stationary environments [39][40]. - Further exploration is needed to address these issues and improve the robustness of the ReMA framework in diverse training scenarios [39].
京东集团算法总监韩艾将在 AICon 北京站分享基于强化学习的异构多智能体联合进化算法
AI前线· 2025-06-20 02:47
Core Insights - The AICon Global Artificial Intelligence Development and Application Conference will take place in Beijing, featuring over 50 experts from leading companies like Tencent, Alibaba, Baidu, and ByteDance, focusing on AI Agent, multimodal applications, and optimization of reasoning performance [1][4]. Group 1: Conference Highlights - The conference will cover various topics including AI Agent construction, multimodal practices, large model support for development, and AI's deep integration into business operations [4]. - A notable presentation will be given by Han Ai, the Algorithm Director of JD Group, discussing the JDAgents-R1 framework, which addresses challenges in multi-agent reinforcement learning (MARL) [2][3]. Group 2: JDAgents-R1 Framework - JDAgents-R1 introduces a joint evolution algorithm framework for heterogeneous multi-agents, utilizing Group Relative Policy Optimization (GRPO) to enhance training efficiency and stability [2]. - The framework balances decision-making and memory capabilities, reducing redundant reasoning and accelerating training convergence, achieving performance comparable to large-scale language models with smaller open-source models [2]. Group 3: Expert Contributions - Han Ai has extensive academic and professional credentials, including a PhD from a joint program between the Chinese Academy of Sciences and Cornell University, and has published numerous papers in top-tier journals [3]. - The presentation will include insights on multi-agent training technologies, application cases, and the evolution of decision-making and memory in multi-agent systems [3].
中国AI门派:汪军与他的学生们
投资界· 2025-03-04 07:41
以下文章来源于雷峰网 ,作者赖文昕 雷峰网 . 洞见智能未来,共与产业变迁 中国强化学习研究的半壁江山。 作者 | 赖文昕 编辑丨陈彩娴 来源 | 雷峰网 (ID:leiphone-sz) 作为一支在 AI 领域历经数十年的研究分支,强化学习仍在历久弥新。 从推荐系统到强化学习 2006 年暑假的一个午后,汪军踏上了从荷兰小城代尔夫特开往首都阿姆斯特丹的火 车,他将在阿姆斯特丹换乘飞机,飞往美国西雅图参加第 29 届国际计算机协会信息检 索大会(ACM SIGIR)。 此时的信息检索领域如日中天,加上微软、雅虎和谷歌三巨头最核心的业务也是搜索, ACM SIGIR 每年都能汇集学术界与工业界的最高人才,来开一场信息检索界的"年 会"。 在华盛顿大学的会场里,汪军在一片掌声中获得了最佳博士联盟奖,于博士毕业的前一 年拿下了信息检索领域博士的最高荣誉。 这位意气风发的青年此刻并未想到,自己将会在 15 年后再获得时间检验奖的荣誉提名 ——2021 年的汪军已转向强化学习(RL)数年,作为发起人之一成立了华人强化学习 社区RL China,为国内强化学习研究培养了一批优秀的青年人才,成为领域的"一代宗 师"。 汪军 ...
UCL强化学习派:汪军与他的学生们
雷峰网· 2025-02-27 10:15
Core Viewpoint - The article discusses the evolution and significance of reinforcement learning (RL) in China, highlighting key figures and their contributions to the field, particularly focusing on Wang Jun and his influence on the development of RL research and education in China [2][46]. Group 1: Historical Context and Development - Wang Jun's journey in AI began with information retrieval and recommendation systems, where he achieved significant academic recognition [4][8]. - His transition to reinforcement learning was influenced by his experiences in advertising, where he recognized the parallels between decision-making in advertising and RL principles [12][14]. - The establishment of the RL China community marked a pivotal moment in promoting RL research and education in China, addressing the lack of resources and formal education in the field [49][50]. Group 2: Contributions and Innovations - Wang Jun and his students have made substantial contributions to RL, including the development of SeqGAN and IRGAN, which integrate RL with generative adversarial networks for improved performance in various applications [23][24]. - The introduction of multi-agent systems in RL research has been a significant focus, with applications in complex environments such as advertising and gaming [27][28]. - The establishment of MediaGamma allowed for practical applications of RL in real-time advertising, showcasing the commercial viability of RL algorithms [17][18]. Group 3: Educational Initiatives and Community Building - The formation of RL China has facilitated knowledge sharing and collaboration among researchers and students, significantly enhancing the learning environment for RL in China [49][52]. - The publication of "Hands-On Reinforcement Learning" has provided accessible educational resources, bridging the gap between theory and practice for students [53]. - Wang Jun's mentorship has fostered a new generation of RL researchers, emphasizing the importance of exploration and innovation in academic pursuits [26][43]. Group 4: Future Directions and Challenges - The integration of RL with large models and embodied intelligence represents a promising frontier for future research, aiming to address the challenges of generalization across different tasks and environments [56][62]. - The ongoing exploration of RL applications in real-world scenarios, such as robotics and automated decision-making, highlights the potential for RL to impact various industries significantly [61][62]. - Despite setbacks in some projects, the commitment to advancing RL research and its applications remains strong among Wang Jun and his students, indicating a resilient and forward-looking approach to the field [56][62].