Core Viewpoint - The article discusses the introduction of a novel robust action value representation learning method called ROUSER, which addresses the lack of long-term information in visual reinforcement learning by utilizing the Information Bottleneck framework [2][9]. Group 1: ROUSER Methodology - ROUSER maximizes the mutual information between the representation and action value to retain long-term information while minimizing the mutual information between the representation and state-action pairs to filter out irrelevant features [4][10]. - The method decomposes the robust representation of state-action pairs into representations that include single-step rewards and the robust representation of the next state-action pair, allowing for effective learning despite unknown action values [5][10]. Group 2: Experimental Results - In experiments involving 12 tasks with background and color interference, ROUSER outperformed various advanced methods in 11 of the tasks, demonstrating its effectiveness in enhancing generalization capabilities [6][18]. - ROUSER is compatible with both continuous and discrete control tasks, as evidenced by experiments conducted in the Procgen environment, which showed improved generalization performance when combined with value-based VRL methods [21][22]. Group 3: Theoretical Foundations - The theoretical proof indicates that ROUSER can accurately estimate action values using the learned vectorized representations, thereby improving the robustness of various visual reinforcement learning algorithms [3][17].
中科大提出动作价值表征学习新方法,率先填补长期决策信息的缺失
量子位·2025-03-31 04:35