如果Policy模型也能动态思考推理，是否能让机器人在真实世界中表现得更好？

Core Insights - The article introduces EBT-Policy (Energy-Based Transformer Policy), a new strategy architecture based on Energy-Based Models (EBM), which enhances robot performance in real-world scenarios by enabling dynamic reasoning and understanding of uncertainty [2][6]. Group 1: EBT-Policy Overview - EBT-Policy significantly improves training and inference efficiency, showcasing a unique "zero-shot retry" capability [4]. - The model learns an energy value to assess the compatibility between input variables, optimizing the energy landscape during language modeling tasks [5]. - EBT-Policy outperforms traditional Diffusion Policy in both simulated and real-world tasks, reducing computational requirements by up to 50 times [6][18]. Group 2: Key Features and Advantages - The model minimizes energy through multiple forward passes during inference, adjusting computational resources based on problem difficulty [8]. - EBT-Policy's emergent retry behavior allows it to recover from errors by dynamically redirecting itself towards lower energy states [10]. - Compared to Diffusion Policy, EBT-Policy requires only 2 steps for inference, while Diffusion Policy typically requires around 100 steps [11]. Group 3: Performance Metrics - In real-world tasks, EBT-Policy demonstrated superior performance, achieving scores of 86, 75, and 92 in tasks like "Fold Towel," "Collect Pan," and "Pick And Place," respectively, compared to Diffusion Policy's lower scores [17]. - The convergence speed during training improved by approximately 66%, and the model's inference process is significantly more efficient [18]. Group 4: Future Outlook - The research team plans to continue optimizing hyperparameters and model scale, expecting further performance enhancements as more experimental data is collected [22].