Continuous Deep Maximum Entropy Inverse Reinforcement Learning using online POMDP

Silva, Junior A. R.; Grassi Jr, Valdir; Wolf, Denis Fernando; IEEE

Author(s):	Silva, Junior A. R. ; Grassi Jr, Valdir ; Wolf, Denis Fernando ; IEEE Total Authors: 4
Document type:	Journal article
Source:	2019 19TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR); v. N/A, p. 6-pg., 2019-01-01.
Abstract
A vehicle navigating in an urban environment must obey traffic rules by properly setting its speed, such as staying below the road speed limit and avoiding collision with other vehicles. This is presumably the scenario that autonomous vehicles will face: they will share the traffic roads with other vehicles (autonomous or not), cooperatively interacting with them. In other words, autonomous vehicles should not only follow traffic rules, but should also behave in such a way that resembles other vehicles behavior. However, manually specification of such behavior is a time-consuming and error-prone task, since driving in urban roads is a complex task, which involves many factors. This paper presents a multitask decision making framework that learns an expert driver's behavior driving in an urban scenario containing traffic lights and other vehicles. For this purpose, Inverse Reinforcement Learning (IRL) is used to learn a reward function that explains the expert driver's behavior. Most IRL approaches require solving a Markov Decision Process (MDP) in each iteration of the algorithm to compute the optimal policy given the current rewards. Nevertheless, the computational cost of solving an MDP is high when considering large state spaces. To overcome this issue, the optimal policy is estimated by sampling trajectories in regions of the space with higher rewards. To do so, the problem is modeled as a continuous Partially Observed Markov Decision Process (POMDP), in which the intentions of other vehicles are only partially observed. An online solver is employed in order to sample trajectories given the current rewards. The efficiency of the proposed framework is demonstrated through simulations, showing that the controlled vehicle is be able to mimic an expert driver's behavior. (AU)

FAPESP's process:	18/19732-5 - Decision making and trajectory planning for intelligent vehicles using partially observable Markov decision processes and inverse reinforcement learning
Grantee:	Júnior Anderson Rodrigues da Silva
Support Opportunities:	Scholarships in Brazil - Doctorate


FAPESP's process:	14/50851-0 - INCT 2014: National Institute of Science and Technology for Cooperative Autonomous Systems Applied in Security and Environment
Grantee:	Marco Henrique Terra
Support Opportunities:	Research Projects - Thematic Grants

Short URL