Improving multi-goal and target-driven reinforcement learning with supervised auxiliary task

Horita, Luiz R. T.; Nakamura, Angelica T. M.; Wolf, Denis F.; Grassi Junior, Valdir; IEEE

Full text
Author(s):	Horita, Luiz R. T. ; Nakamura, Angelica T. M. ; Wolf, Denis F. ; Grassi Junior, Valdir ; IEEE Total Authors: 5
Document type:	Journal article
Source:	2021 20TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR); v. N/A, p. 6-pg., 2021-01-01.
Abstract
Recent works on Deep Reinforcement Learning (DRL) for episodic multi-goal learning have been using the Universal Value Function Approximation (UVFA) idea to learn a universal policy by taking information from the current and goal states as input. Even though they present good results, there is one aspect that might need more attention: the State Representation Learning (SRL) concept. In machine learning, SRL is not a new subject, and there are many works on it, but it is still not very explored for multi-goal learning. Instead, an end-to-end DRL is commonly adopted to learn state representation implicitly. In simple problems, this approach might be enough, but to others, it can make it harder to learn an optimal policy or lead to overfitting. In light of this, we hypothesize that an auxiliary task closely related to the target policy learning can lead to better results by conditioning the SRL, which is essential to encode the latent state space. Also, motivated by the multi-task learning idea, we propose a framework for simultaneous supervised and reinforcement learning to avoid catastrophic forgetting. Taking the visual-based navigation on a topological urban environment as an instance of the multi-goal learning problem, we use semantic segmentation as the auxiliary task. Based on experimental results, we show that our method accelerates the DRL convergence and allows reaching better policies with higher generalization levels. (AU)

FAPESP's process:	19/03366-2 - Spatial instance segmentation in monocular images through convolutional neural networks
Grantee:	Angelica Tiemi Mizuno Nakamura
Support Opportunities:	Scholarships in Brazil - Doctorate


FAPESP's process:	14/50851-0 - INCT 2014: National Institute of Science and Technology for Cooperative Autonomous Systems Applied in Security and Environment
Grantee:	Marco Henrique Terra
Support Opportunities:	Research Projects - Thematic Grants

Short URL