Workflow
Qwen2
icon
Search documents
吴伟:中国科技崛起吹响AI平权的号角
Huan Qiu Wang Zi Xun· 2025-09-01 22:53
Group 1 - The 2025 Global AI Influence List by Time magazine features several Chinese entrepreneurs and scholars, indicating a significant increase in representation and diversity compared to previous years [1] - The rise of Chinese figures on the list reflects the rapid development of China's AI industry and its increasing presence on the international stage, as well as the global trend of "de-geographicalization" in technology [1] - The open-source technology path taken by DeepSeek contributes to a more inclusive global technology landscape, enhancing the openness and participation of the AI industry [1] Group 2 - Southeast Asia is actively seizing opportunities from the "de-geographicalization" wave in AI, with the region's digital economy projected to reach $2 trillion by 2030, and the AI market expected to exceed $580 billion [2] - Countries like Singapore, Malaysia, and Indonesia are implementing national AI strategies and attracting significant investments from major tech companies, indicating a shift towards technological self-sufficiency [2] - The rise of local innovation in developing countries is seen as a way to dismantle external technological monopolies and empower these nations as creators of AI technology [2] Group 3 - Despite the concentration of top AI talent in the U.S., Chinese talent now accounts for 38% of the top AI research institutions in the U.S., surpassing the 37% of local talent [3] - The increase in homegrown talent and the return of overseas scholars signal a promising future for China's talent strategy focused on local cultivation and talent repatriation [3] - China's AI industry is characterized by a systematic innovation paradigm driven by top-level policies, autonomous innovation, and a commitment to long-termism [3] Group 4 - The performance gap between Chinese and U.S. large models has dramatically decreased from 17.5% in 2023 to just 0.3% [4] - China's unique advantages in open-source ecosystem development and vertical application innovation have contributed to this rapid advancement [4] - The success of China's AI rise is attributed to the establishment of an open, symbiotic ecosystem that fosters talent and continuous innovation, providing a valuable model for global AI development [4]
ARPO:智能体强化策略优化,让Agent在关键时刻多探索一步
机器之心· 2025-08-09 06:02
Core Viewpoint - The article introduces a novel method called Agentic Reinforced Policy Optimization (ARPO), designed to enhance the performance of large language models (LLMs) in multi-round interactions by addressing the challenges of uncertainty and exploration during tool usage [3][41]. Group 1: Research Motivation and Background - The emergence of Agentic Reinforcement Learning (RL) is driven by the need for LLMs to engage in dynamic multi-round interactions with external tools, moving from static problem-solving to a more interactive agent-environment reasoning paradigm [8]. - Existing Agentic RL methods often underestimate the value of multi-round interactions due to sparse rewards and overuse of tools, leading to a lack of fine-grained exploration of tool usage [8][41]. - The study identifies a significant increase in entropy (uncertainty) after tool calls, indicating an opportunity for exploration that current methods do not fully leverage [14][16]. Group 2: ARPO Methodology - ARPO introduces an entropy-driven adaptive rollout strategy that enhances exploration during high-entropy tool usage phases, allowing for more diverse reasoning paths [11][20]. - The method includes four key steps: initialization of global rollout, monitoring entropy changes, adaptive branching based on entropy, and defining termination conditions for the rollout process [24][27]. - ARPO incorporates advantage attribution estimation to help the model better internalize the value differences in tool usage at each step [28][30]. Group 3: Experimental Results - ARPO outperforms existing sample-level RL methods, achieving better performance with only half the tool call budget across 13 challenging benchmarks, demonstrating its efficiency in training multi-round reasoning agents [21][41]. - The method shows consistent improvements in performance metrics such as Pass@3 and Pass@5, particularly in dynamic, multi-round tasks [37][39]. - In comparative tests, ARPO achieves higher accuracy than GRPO and DAPO in various tasks, including deep search and knowledge-intensive reasoning [41][42]. Group 4: Future Directions - Future research may explore the application of ARPO in multi-modal tasks, expanding its capabilities beyond text-based reasoning to include images and videos [42]. - There is potential for integrating a broader range of external tools to enhance complex task performance through optimized tool usage strategies [42]. - The scalability and real-time deployment of ARPO in larger models and dynamic environments could further improve its practical value and cost-effectiveness [42].