Core Insights - Reinforcement Learning (RL) has achieved significant results across various fields, but its performance in tasks with long time spans and sparse rewards remains unsatisfactory [1][2] - Traditional RL methods often struggle with exploration efficiency in such tasks, as rewards are only received after executing long sequences of actions, making it difficult to find effective strategies in a reasonable timeframe [3][10] Method Overview - The introduction of Imitation Learning (IL) concepts into RL could potentially improve performance, particularly in scenarios with large state and action spaces where designing reward functions is challenging [4] - The proposed Q-chunking method incorporates action chunking into Temporal Difference (TD) based RL, addressing two core issues: enhancing exploration efficiency through temporally coherent action sequences and achieving faster value propagation without introducing bias from traditional n-step returns [5][12] Implementation Details - Q-chunking extends standard Q-learning to a time-extended action space, allowing the policy to predict sequences of actions over multiple steps rather than single-step actions [15] - The method includes a behavior constraint to ensure that the learned policy remains close to the offline data distribution, which is crucial for effective exploration and utilization of offline data [18][19] Experimental Results - The researchers tested Q-chunking in six sparse reward robotic manipulation tasks, demonstrating competitive performance in offline phases and high sample efficiency in online phases, particularly in challenging tasks [23][25] - Ablation studies showed that Q-chunking outperformed its variants and traditional n-step return baselines, highlighting the importance of learning in a time-extended action space [27] - The analysis indicated that action chunking leads to more temporally coherent actions, resulting in better state coverage and exploration efficiency [28][32]
用动作分块突破RL极限,伯克利引入模仿学习,超越离线/在线SOTA
机器之心·2025-07-14 04:08