Improving multi-goal and target-driven reinforcement learning with supervised auxiliary task

Horita, Luiz R. T.; Nakamura, Angelica T. M.; Wolf, Denis F.; Grassi Junior, Valdir; IEEE

Texto completo
Autor(es):	Horita, Luiz R. T. ; Nakamura, Angelica T. M. ; Wolf, Denis F. ; Grassi Junior, Valdir ; IEEE Número total de Autores: 5
Tipo de documento:	Artigo Científico
Fonte:	2021 20TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR); v. N/A, p. 6-pg., 2021-01-01.
Resumo
Recent works on Deep Reinforcement Learning (DRL) for episodic multi-goal learning have been using the Universal Value Function Approximation (UVFA) idea to learn a universal policy by taking information from the current and goal states as input. Even though they present good results, there is one aspect that might need more attention: the State Representation Learning (SRL) concept. In machine learning, SRL is not a new subject, and there are many works on it, but it is still not very explored for multi-goal learning. Instead, an end-to-end DRL is commonly adopted to learn state representation implicitly. In simple problems, this approach might be enough, but to others, it can make it harder to learn an optimal policy or lead to overfitting. In light of this, we hypothesize that an auxiliary task closely related to the target policy learning can lead to better results by conditioning the SRL, which is essential to encode the latent state space. Also, motivated by the multi-task learning idea, we propose a framework for simultaneous supervised and reinforcement learning to avoid catastrophic forgetting. Taking the visual-based navigation on a topological urban environment as an instance of the multi-goal learning problem, we use semantic segmentation as the auxiliary task. Based on experimental results, we show that our method accelerates the DRL convergence and allows reaching better policies with higher generalization levels. (AU)

Processo FAPESP:	19/03366-2 - Segmentação espacial de instâncias a partir de câmera monocular utilizando redes neurais convolutivas
Beneficiário:	Angelica Tiemi Mizuno Nakamura
Modalidade de apoio:	Bolsas no Brasil - Doutorado


Processo FAPESP:	14/50851-0 - INCT 2014: Instituto Nacional de Ciência e Tecnologia para Sistemas Autônomos Cooperativos Aplicados em Segurança e Meio Ambiente
Beneficiário:	Marco Henrique Terra
Modalidade de apoio:	Auxílio à Pesquisa - Temático

URL curto