SOP：具身智能在线进化新范式，为大规模真实世界部署而生

Core Viewpoint - The article discusses the development and implementation of SOP (Scalable Online Post-training), a system designed for the scalable deployment and intelligent operation of general-purpose robots in real-world environments, emphasizing the need for continuous evolution and learning in robotic systems [2][3][23]. Group 1: SOP Overview - SOP is the first system in the industry to systematically integrate online learning, distributed architecture, and multi-task versatility for real-world deployment of robots [2]. - The core goal of SOP is to enable distributed and continuous online learning for robots in real-world settings [5]. Group 2: Challenges in Real-World Deployment - General-purpose robots face the challenge of meeting high task specialization requirements while leveraging existing VLA pre-trained models, which often require post-training for improved task success rates [3][4]. - Current mainstream VLA post-training methods are limited by offline, single-machine, and serial data collection, hindering efficient and continuous real-world learning [3]. Group 3: SOP Framework Design - SOP redefines VLA post-training from "offline, single-machine, sequential" to "online, cluster, parallel," creating a low-latency closed-loop system [6]. - The system allows multiple robots to execute tasks in parallel, with cloud-based centralized online updates and immediate model parameter feedback [6][9]. Group 4: Key Advantages of SOP - SOP enhances state space exploration through distributed multi-robot parallel exploration, significantly improving state-action coverage [12]. - It mitigates distribution bias by ensuring all robots operate based on the latest low-latency strategies, enhancing online training stability and consistency [13]. - SOP maintains generalization capabilities while improving performance, avoiding the degradation of models into single-task specialists [14]. Group 5: Experimental Evaluation - SOP significantly improves performance metrics, with a 33% overall performance increase in complex scenarios and a 114% increase in throughput for specific tasks like folding clothes [16][18]. - The use of multiple robots enhances learning efficiency, with a four-robot setup achieving a 92.5% success rate, 12% higher than a single robot [19][20]. - SOP demonstrates stable effectiveness across different pre-training scales, with performance improvements correlating with the quality of VLA pre-training [21]. Group 6: Deployment and Evolution - SOP allows robots to adapt and improve in new environments, transforming them from fixed-performance products into evolving entities capable of continuous learning [23].