具身大模型LaST₀：双臂/移动/灵巧手全面新SOTA，首次引入隐空间时空思维链

Core Insights - The article introduces LaST₀, a novel VLA model that utilizes Latent Spatio-Temporal CoT for efficient reasoning in robotics, achieving state-of-the-art performance in various tasks [1][2][4]. Group 1: Model Overview - LaST₀ integrates high-efficiency latent space reasoning into embodied large models, surpassing previous methods like Pi0.5 in dual-arm and humanoid dexterous hand tasks [2][4]. - The model employs a Mixture-of-Transformers (MoT) architecture, featuring a slow reasoning expert for low-frequency latent space reasoning and a fast action expert for high-frequency action generation [5][11]. Group 2: Technical Innovations - LaST₀ introduces a compact latent space to model future visual dynamics, 3D structural information, and robot proprioceptive states, enabling a coherent temporal reasoning process [4][10]. - The model's architecture allows for asynchronous frequency coordination between the slow reasoning expert and the fast execution expert, optimizing real-time robotic operations [23]. Group 3: Performance Metrics - In simulations, LaST₀ achieved an average success rate of 82% across 10 RLBench tasks, outperforming existing state-of-the-art methods by 8% to 21% [24]. - In real-world tasks, LaST₀ demonstrated a 72% average success rate on the Franka platform, significantly exceeding competitors like SpatialVLA (41%) and CoT-VLA (50%) [27]. Group 4: Implications for Robotics - The model's ability to capture intricate physical and dynamic features through latent space reasoning enhances its performance in complex robotic tasks, indicating its potential for broader applications in dynamic environments [9][28]. - LaST₀'s design allows for effective interaction with the physical world, crucial for robust robotic operations in various settings [9][12].