Workflow
多智能体强化学习
icon
Search documents
天南海北新年味|刷新“亲吻数”纪录的“新年礼物” 揭秘PackingStar背后的科学浪漫
Xin Hua Cai Jing· 2026-02-15 07:41
Core Insights - The research team from Shanghai Institute of Science and Intelligent Technology, in collaboration with Peking University and Fudan University, has developed a multi-agent reinforcement learning system called PackingStar, which has set new records in the long-standing mathematical problem known as the "kissing number" problem, marking a significant breakthrough in the field of mathematical structures [1][2][3] Group 1: Research and Development - PackingStar addresses high-dimensional combinatorial optimization problems, similar to challenges in new material design and drug discovery, by finding optimal solutions in exponentially growing search spaces [3] - The system has revealed solutions that possess clear geometric rules while breaking global symmetry, leading to new mathematical constructs that were previously incomprehensible [3] - The collaboration between human intuition and AI in the research process has transformed the role of mathematicians from tedious calculations to becoming "mathematical observers" and "intuition designers" [3][4] Group 2: AI and Human Collaboration - The project signifies a shift towards a new paradigm of collaborative research where human mathematicians provide insights and intuition, while AI constructs structures and searches for proofs, creating a feedback loop that enhances both AI capabilities and human mathematical intuition [4][5] - The development of PackingStar is compared to AlphaFold in biology, highlighting the need for deep collaboration between AI experts and mathematicians to tackle problems that lack existing training data [4][6] Group 3: Cultural and Philosophical Context - The team embodies a cross-disciplinary approach, merging backgrounds in physics, AI, and mathematics, which fosters a creative environment conducive to scientific breakthroughs [7][8] - The name "PackingStar" reflects both the research focus on high-dimensional space and the diverse talents of the team members, symbolizing a new generation of scientific inquiry at the intersection of technology and humanities [7][8]
情人节最硬核“Kiss”!中国AI突破300年亲吻数难题,连刷多维度纪录
量子位· 2026-02-14 08:13
亲吻数又叫牛顿数,是希尔伯特第十八问题(球体堆积)的局部形式,和通信技术中的"比特拥挤"问题是同一套底层逻辑。 闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 情人节到了… 那咱也来应应景,讲讲亲吻这件事—— AI的打开方式。 你或许知道,数学上有个正经问题叫做 亲吻数(Kissing Number Problem) ,卡了人类300多年,但就在最近,被 中国AI 狠狠推了一 把。 简单说,它研究的是:在n维空间中,一个球体周围,最多能有多少个和它大小相同的球体,刚好与它相切(kiss),不重叠的那种 。 它源自于1694年,牛顿和格雷戈里两位大佬的争吵: 在三维空间里,一个球周围到底能放12个,还是13个同款球?牛顿坚持12,格雷戈里不服,结果谁也没能当场辩过谁。 直到1953年,数学家用了 258年 时间才严格证明牛顿是对的。 就连2022年获得 菲尔兹奖 的玛丽娜·维亚佐夫斯卡, 正是凭借解决8维和24维空间的最密球体堆积问题,摘得桂冠。 但再往高维走,人类的直觉就崩了。在过去近50年里,亲吻数构造仅有7次实质性进展,而且每一次的方法都完全不同,在临近维度上难以迁 移与复用。 现在,僵局被打破了。 ...
SPIRAL:零和游戏自对弈成为语言模型推理训练的「免费午餐」
机器之心· 2025-07-30 05:13
Core Insights - The research introduces SPIRAL, a framework that utilizes self-play in zero-sum games to enhance reasoning capabilities in language models without relying on human supervision [3][33]. - The study demonstrates that competitive self-play can lead to significant improvements in reasoning skills, as evidenced by a 8.7% increase in mathematical reasoning ability and an 18.1 percentage point improvement on the Minerva Math benchmark [7][30]. Group 1: Research Background - The collaborative research involves institutions such as the National University of Singapore and A*STAR, focusing on scalable autonomous agents capable of intelligent decision-making in unknown environments [1]. - The success of models like OpenAI's o1 and DeepSeek-R1 highlights the potential of reinforcement learning to enhance reasoning capabilities in language models [2]. Group 2: SPIRAL Framework - SPIRAL employs self-play in zero-sum games to autonomously discover and reinforce generalizable reasoning patterns, eliminating the need for manually designed reward functions and expert supervision [3][6]. - The framework utilizes a distributed online multi-agent reinforcement learning system for fine-tuning large language models across various two-player zero-sum games [24]. Group 3: Game-Based Training - The research identifies three games with distinct cognitive demands—TicTacToe, Kuhn Poker, and Simple Negotiation—as effective training environments for enhancing reasoning skills [12][11]. - The self-play mechanism allows for adaptive difficulty adjustments, ensuring continuous evolution of the model's capabilities [11]. Group 4: Transfer of Skills - The study reveals that reasoning patterns developed in games can transfer to mathematical problem-solving, with specific skills like expected value calculation and case analysis showing significant migration rates [18][19]. - The multi-game training approach leads to synergistic effects, enhancing performance in unfamiliar games compared to single-game specialists [21]. Group 5: Technical Innovations - The introduction of Role-Aware Advantage Estimation (RAE) prevents "thinking collapse," ensuring stable gradient updates and consistent reasoning generation throughout training [26][28]. - The SPIRAL framework has shown effectiveness even in strong models, with notable performance improvements in established benchmarks [30]. Group 6: Practical Implications - SPIRAL offers a novel approach for researchers and engineers aiming to enhance model reasoning capabilities without the need for extensive high-quality reasoning data [35]. - The findings suggest that pre-trained models already contain various reasoning patterns, and reinforcement learning can help identify and strengthen those that are truly generalizable [35]. Group 7: Limitations and Future Directions - Despite its successes, SPIRAL faces limitations such as the need for carefully designed game environments and high computational resource demands [38]. - Future research may explore hybrid game types and meta-game learning to cultivate more comprehensive reasoning abilities [37].
Meta-Think ≠ 记套路,多智能体强化学习解锁大模型元思考泛化
机器之心· 2025-07-03 03:26
Core Viewpoint - The article discusses a new framework called ReMA (Reinforced Meta-thinking Agents) designed to enhance the reasoning capabilities of large language models (LLMs) by introducing a multi-agent system that separates meta-thinking from reasoning tasks, thereby improving adaptability and effectiveness in complex problem-solving [3][4][6][10]. Group 1: Introduction and Background - Recent explorations in large model reasoning have introduced various paradigms, including structured search and process reward models, but the mechanisms behind "Aha Moments" in reasoning remain unclear [3]. - The study emphasizes the importance of reasoning patterns and posits that the strength of complex reasoning in large models fundamentally relies on their meta-thinking abilities [3][4]. Group 2: ReMA Framework - The ReMA framework consists of two hierarchical agents: the meta-thinking agent, which generates strategic supervision and planning, and the reasoning agent, which executes detailed sub-tasks based on the meta-thinking agent's guidance [10][11]. - This multi-agent system allows for a more structured and efficient exploration of the reasoning process, balancing generalization capabilities and exploration efficiency [12]. Group 3: Methodology - The study defines a single-round multi-agent meta-thinking reasoning process (MAMRP) where the meta-thinking agent analyzes the problem and generates a solution plan, while the reasoning agent completes the task based on these instructions [13][14]. - In multi-round interactions, the meta-thinking agent can provide ongoing guidance, allowing for planning, reflection, and correction throughout the reasoning process [14][20]. Group 4: Experimental Results - In single-round experiments, ReMA consistently outperformed baseline methods across various benchmarks, demonstrating superior generalization capabilities, particularly on out-of-distribution datasets [27][28]. - The results indicate that ReMA's meta-thinking mechanism significantly enhances performance, with improvements noted in specific benchmarks such as AMC23, where performance increased by up to 20% [28][29]. Group 5: Challenges and Future Work - The study acknowledges challenges in multi-round training, including instability and sensitivity to hyperparameters, suggesting that the current training processes may not be suitable for stochastic or non-stationary environments [39][40]. - Further exploration is needed to address these issues and improve the robustness of the ReMA framework in diverse training scenarios [39].
京东集团算法总监韩艾将在 AICon 北京站分享基于强化学习的异构多智能体联合进化算法
AI前线· 2025-06-20 02:47
Core Insights - The AICon Global Artificial Intelligence Development and Application Conference will take place in Beijing, featuring over 50 experts from leading companies like Tencent, Alibaba, Baidu, and ByteDance, focusing on AI Agent, multimodal applications, and optimization of reasoning performance [1][4]. Group 1: Conference Highlights - The conference will cover various topics including AI Agent construction, multimodal practices, large model support for development, and AI's deep integration into business operations [4]. - A notable presentation will be given by Han Ai, the Algorithm Director of JD Group, discussing the JDAgents-R1 framework, which addresses challenges in multi-agent reinforcement learning (MARL) [2][3]. Group 2: JDAgents-R1 Framework - JDAgents-R1 introduces a joint evolution algorithm framework for heterogeneous multi-agents, utilizing Group Relative Policy Optimization (GRPO) to enhance training efficiency and stability [2]. - The framework balances decision-making and memory capabilities, reducing redundant reasoning and accelerating training convergence, achieving performance comparable to large-scale language models with smaller open-source models [2]. Group 3: Expert Contributions - Han Ai has extensive academic and professional credentials, including a PhD from a joint program between the Chinese Academy of Sciences and Cornell University, and has published numerous papers in top-tier journals [3]. - The presentation will include insights on multi-agent training technologies, application cases, and the evolution of decision-making and memory in multi-agent systems [3].
中国AI门派:汪军与他的学生们
投资界· 2025-03-04 07:41
以下文章来源于雷峰网 ,作者赖文昕 雷峰网 . 洞见智能未来,共与产业变迁 中国强化学习研究的半壁江山。 作者 | 赖文昕 编辑丨陈彩娴 来源 | 雷峰网 (ID:leiphone-sz) 作为一支在 AI 领域历经数十年的研究分支,强化学习仍在历久弥新。 从推荐系统到强化学习 2006 年暑假的一个午后,汪军踏上了从荷兰小城代尔夫特开往首都阿姆斯特丹的火 车,他将在阿姆斯特丹换乘飞机,飞往美国西雅图参加第 29 届国际计算机协会信息检 索大会(ACM SIGIR)。 此时的信息检索领域如日中天,加上微软、雅虎和谷歌三巨头最核心的业务也是搜索, ACM SIGIR 每年都能汇集学术界与工业界的最高人才,来开一场信息检索界的"年 会"。 在华盛顿大学的会场里,汪军在一片掌声中获得了最佳博士联盟奖,于博士毕业的前一 年拿下了信息检索领域博士的最高荣誉。 这位意气风发的青年此刻并未想到,自己将会在 15 年后再获得时间检验奖的荣誉提名 ——2021 年的汪军已转向强化学习(RL)数年,作为发起人之一成立了华人强化学习 社区RL China,为国内强化学习研究培养了一批优秀的青年人才,成为领域的"一代宗 师"。 汪军 ...
UCL强化学习派:汪军与他的学生们
雷峰网· 2025-02-27 10:15
Core Viewpoint - The article discusses the evolution and significance of reinforcement learning (RL) in China, highlighting key figures and their contributions to the field, particularly focusing on Wang Jun and his influence on the development of RL research and education in China [2][46]. Group 1: Historical Context and Development - Wang Jun's journey in AI began with information retrieval and recommendation systems, where he achieved significant academic recognition [4][8]. - His transition to reinforcement learning was influenced by his experiences in advertising, where he recognized the parallels between decision-making in advertising and RL principles [12][14]. - The establishment of the RL China community marked a pivotal moment in promoting RL research and education in China, addressing the lack of resources and formal education in the field [49][50]. Group 2: Contributions and Innovations - Wang Jun and his students have made substantial contributions to RL, including the development of SeqGAN and IRGAN, which integrate RL with generative adversarial networks for improved performance in various applications [23][24]. - The introduction of multi-agent systems in RL research has been a significant focus, with applications in complex environments such as advertising and gaming [27][28]. - The establishment of MediaGamma allowed for practical applications of RL in real-time advertising, showcasing the commercial viability of RL algorithms [17][18]. Group 3: Educational Initiatives and Community Building - The formation of RL China has facilitated knowledge sharing and collaboration among researchers and students, significantly enhancing the learning environment for RL in China [49][52]. - The publication of "Hands-On Reinforcement Learning" has provided accessible educational resources, bridging the gap between theory and practice for students [53]. - Wang Jun's mentorship has fostered a new generation of RL researchers, emphasizing the importance of exploration and innovation in academic pursuits [26][43]. Group 4: Future Directions and Challenges - The integration of RL with large models and embodied intelligence represents a promising frontier for future research, aiming to address the challenges of generalization across different tasks and environments [56][62]. - The ongoing exploration of RL applications in real-world scenarios, such as robotics and automated decision-making, highlights the potential for RL to impact various industries significantly [61][62]. - Despite setbacks in some projects, the commitment to advancing RL research and its applications remains strong among Wang Jun and his students, indicating a resilient and forward-looking approach to the field [56][62].