符号世界模型
Search documents
像开发软件一样造世界,Agent2World来了,把世界模型做成可运行的符号环境
机器之心· 2026-02-02 06:14
Core Insights - The article discusses the development of Agent2World, a tool-augmented multi-agent framework designed to create executable and verifiable symbolic world models, moving beyond traditional script-based generation methods [4][37]. - Agent2World demonstrates significant performance improvements across three benchmarks: Text2World (PDDL), CWMB (MuJoCo), and ByteSized32 (text games), showcasing its potential as a high-quality data synthesis engine [4][24]. Group 1: Challenges in Traditional Approaches - Existing automated route generation methods face three main challenges: script-based workflows, closed knowledge boundaries, and single representation coverage, which limit their effectiveness [3][8]. - Traditional "draft-repair" scripts can fix syntax but struggle to ensure that the generated world models are logically sound and executable [8][9]. Group 2: Methodology Breakdown - Agent2World's approach consists of three stages: Knowledge Synthesis, World Model Generation, and Evaluation-Driven Refinement, integrating research, development, and testing into a reusable generation paradigm [4][12]. - The framework includes a Deep Researcher for knowledge retrieval, a Model Developer for generating world models, and a Testing Team for dynamic validation, ensuring high reliability [16][18]. Group 3: Experimental Validation - Agent2World achieved state-of-the-art performance in the Text2World benchmark, with a 93.1% executability rate, a 14.9 percentage point improvement over the previous best [25]. - In the CWMB benchmark, Agent2World Multi achieved an Overall Normalized Return of 0.4811, outperforming the previous best by 0.132, indicating its effectiveness in supporting downstream planning and control tasks [27]. - The ByteSized32 benchmark showed a significant improvement in physical reality alignment, with a score of 0.4768, highlighting the model's ability to generate logically consistent and stable environments [29]. Group 4: Model Fine-tuning and Ablation Studies - Fine-tuning based on high-quality trajectory data led to a 30.95% average relative performance improvement in unseen tasks, demonstrating the effectiveness of the "Agent nurturing Model" strategy [34]. - Ablation studies confirmed that both the Deep Researcher and Testing Team are essential components for building reliable world models, with significant performance drops observed when either was removed [36][38].