Workflow
AsyncThink
icon
Search documents
AI「智能体组织」时代开启,微软提出异步思考AsyncThink
3 6 Ke· 2025-11-05 10:52
Core Insights - The article discusses the transition from large language models (LLMs) to agentic organizations, highlighting the need for LLMs to not only think independently but also collaborate as organized systems to achieve the vision of "agentic organization" [1][20]. Group 1: AsyncThink Methodology - The AsyncThink method introduces an "Organizer-Worker" thinking protocol, where LLMs act as both organizers that decompose complex problems into sub-tasks and workers that execute these tasks [2][4]. - The training of the AsyncThink model involves a two-phase process: cold-start format fine-tuning and reinforcement learning [5][6]. Group 2: Cold-Start Format Fine-Tuning - In the cold-start phase, existing LLMs undergo fine-tuning to master the syntax and action structure of the AsyncThink framework, utilizing synthetic training data generated by GPT-4o [5][6]. - The model learns to issue effective organizer actions but does not yet generate correct answers using asynchronous thinking [5][18]. Group 3: Reinforcement Learning - The reinforcement learning phase guides the model to learn efficient and accurate strategies through rewards, ensuring the final answers are correct and the generated trajectories are executable [7][9]. - The model's output is structured as a "thinking structure" composed of organizers and multiple workers, optimizing towards a common goal [9][10]. Group 4: Experimental Evaluation - AsyncThink demonstrated superior performance in multi-solution countdown tasks, achieving a full accuracy rate of 89.0%, significantly higher than parallel (68.6%) and sequential thinking (70.5%) models [11][10]. - In mathematical reasoning tasks, AsyncThink achieved an accuracy of 38.7% on AIME-24 and 73.3% on AMC-23, with a reduction in reasoning latency by approximately 28% compared to traditional parallel reasoning [14][15]. - The model also excelled in cross-task generalization, achieving an accuracy of 89.4% in a 4x4 Sudoku task, indicating the learned organizational thinking pattern is transferable [16][17]. Group 5: Ablation Studies - Ablation studies revealed that format fine-tuning teaches the LLM the "language" of Fork and Join, while reinforcement learning imparts the "strategy" for efficient execution [18][19]. - The removal of key components in the AsyncThink model resulted in decreased accuracy and increased latency, underscoring the importance of each element in the training process [19]. Group 6: Future Work - Future research will focus on scaling and diversifying the number of workers in the agentic organization, exploring how accuracy-latency trade-offs evolve as the pool of agents increases [21][20]. - The concept of recursive agentic organizations will be explored, allowing any worker to become a sub-organizer, enhancing flexibility in problem-solving [22][20]. - Integrating human agents into the organization will create a collaborative framework, allowing for mixed intelligence where humans can act as organizers or workers [23][20].