扩散强化学习 - filings, earnings calls, financial reports, news

扩散强化学习

Search documents

具身智能之心· 2025-10-23 04:00

Core Insights - The article discusses the limitations of traditional human demonstration data in training Visual-Language-Action (VLA) models and introduces a novel diffusion-based reinforcement learning (RL) approach to generate high-quality training data [2][5]. Group 1: VLA Model and Data Generation - VLA models integrate visual, language, and action information, but their performance is often constrained by the quality and scale of manually collected data [5]. - The proposed diffusion RL algorithm offers a semi-automated method for high-quality data collection suitable for VLA training, enhancing model performance [5]. Group 2: Methodology and Results - The study presents an improved diffusion strategy optimization algorithm that generates high-quality, low-variance trajectories for VLA training [2]. - Evaluation on the LIBERO benchmark, which includes 130 long-horizon tasks, shows that the generated trajectories are smoother and more consistent than human demonstration data and outperform standard Gaussian RL-generated trajectories [2]. - Training VLA models solely on data generated by diffusion RL achieves an average success rate of 81.9%, which is a 5.3 percentage point improvement over human data and a 12.6 percentage point improvement over Gaussian RL data [2]. Group 3: Key Highlights - The article emphasizes the potential of RL-driven robot trajectory generation and the adaptability of the general RL framework to any VLA architecture [6]. - It highlights the performance breakthroughs that exceed human demonstrations, showcasing the effectiveness of the proposed approach [6].