Feng, Xiaoyun (2022) Consistent Experience Replay in High-Dimensional Continuous Control with Decayed Hindsights. Machines, 10 (10). p. 856. ISSN 2075-1702
machines-10-00856.pdf
Download (1MB)
Abstract
The manipulation of complex robotics, which is in general high-dimensional continuous control without an accurate dynamic model, summons studies and applications of reinforcement learning (RL) algorithms. Typically, RL learns with the objective of maximizing the accumulated rewards from interactions with the environment. In reality, external rewards are not trivial, which depend on either expert knowledge or domain priors. Recent advances on hindsight experience replay (HER) instead enable a robot to learn from the automatically generated sparse and binary rewards, indicating whether it reaches the desired goals or pseudo goals. However, HER inevitably introduces hindsight bias that skews the optimal control since the replays against the achieved pseudo goals may often differ from the exploration of the desired goals. To tackle the problem, we analyze the skewed objective and induce the decayed hindsight (DH), which enables consistent multi-goal experience replay via countering the bias between exploration and hindsight replay. We implement DH for goal-conditioned RL both in online and offline settings. Experiments on online robotic control tasks demonstrate that DH achieves the best average performance and is competitive with state-of-the-art replay strategies. Experiments on offline robotic control tasks show that DH substantially improves the ability to extract near-optimal policies from offline datasets.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | robotic control; goal-conditioned reinforcement learning; offline reinforcement learning; sparse rewards; experience replay; hindsight bias |
Subjects: | STM Repository > Engineering |
Depositing User: | Managing Editor |
Date Deposited: | 06 Jan 2023 09:36 |
Last Modified: | 18 Sep 2023 11:24 |
URI: | http://classical.goforpromo.com/id/eprint/266 |