Effective Deep Reinforcement Learning Setups for Multiple Goals on Visual Navigation

Takeshi Horita, Luiz Ricardo; Wolf, Denis Fernando; Grassi Junior, Valdir; IEEE

Full text
Author(s):	Takeshi Horita, Luiz Ricardo ; Wolf, Denis Fernando ; Grassi Junior, Valdir ; IEEE Total Authors: 4
Document type:	Journal article
Source:	2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN); v. N/A, p. 8-pg., 2020-01-01.
Abstract
Deep Reinforcement Learning (DRL) represents an interesting class of algorithms, since its objective is to learn a behavioral policy through interaction with the environment, leveraging the function approximation properties of neural networks. Nonetheless, for episodic problems, it is usually modeled to deal with a unique goal. In this sense, some works showed that it is possible to learn multiple goals when using a Universal Value Function Approximator (UVFA), i.e. a method to learn a universal policy by taking information about the current state of the agent and the goal. Their results are promising but show that there is still space for new contributions regarding the integration of the goal information into the model. For this reason, we propose using the Hadamard product or the Gated-Attention module in the UVFA architecture for visual-based problems. Also, we propose a hybrid exploration strategy based on the 6-greedy and the categorical probability distribution, namely 6-categorical. By systematically comparing different architectures of UVFA for different exploration strategies, and applying or not the Trust Region Policy Optimization (TRPO), we demonstrate through experiments that, for visual topologic navigation, combining visual information of the current and goal states through Hadamard product or Gated-Attention module allows the network learning near-optimal navigation policies. Also, we empirically show that the 6-categorical policy helps to avoid local minimums during the training, which facilitates the convergence to better results. (AU)

FAPESP's process:	14/50851-0 - INCT 2014: National Institute of Science and Technology for Cooperative Autonomous Systems Applied in Security and Environment
Grantee:	Marco Henrique Terra
Support Opportunities:	Research Projects - Thematic Grants

Short URL