全异构、全异步的RLinf v0.2尝鲜版发布，支持真机强化学习，像使用GPU一样使用你的机器人！

Core Insights - The article discusses the ongoing data debate in the field of embodied intelligence, particularly between simulation data and real machine data, emphasizing the need for infrastructure that supports various technological explorations [2] Summary by Sections RLinf v0.2 Features - RLinf v0.2 is designed for users adopting real machine routes, supporting real machine reinforcement learning [2][4] - Users can utilize robots as flexible resources similar to GPUs, allowing for easy integration and configuration through a YAML file, significantly lowering usage costs [5][6] - The system aims to enable large-scale distributed real machine reinforcement learning, addressing challenges in stability, usability, and flexibility [9] Heterogeneous Hardware Support - RLinf supports flexible configurations of heterogeneous software and hardware clusters, enhancing system throughput and training efficiency [11][12] - It allows integration of various hardware setups, such as high-fidelity simulators on RTX 4090 GPUs, training on large memory GPUs like A800, and running robot controllers on CPU machines [13][14] Asynchronous Off-Policy Algorithms - RLinf v0.2 introduces a fully asynchronous design, decoupling inference nodes from training nodes, which significantly improves training efficiency [16] - It incorporates typical off-policy reinforcement learning algorithms, enhancing data utilization by leveraging both online and offline data [16] Experimental Results - The initial version of RLinf focuses on small model real machine reinforcement learning, using the Franka robotic arm for two quick validation tasks: Charger and Peg Insertion [19][21] - The training process includes human-in-the-loop interventions to improve efficiency, with successful results documented in training videos [21][22] Community Engagement - The RLinf team expresses gratitude to its community of 2,000 users, whose feedback has driven continuous improvements and feature updates since its release [22]