Robust topological policy iteration for infinite horizon bounded Markov Decision Processes

Full text
Author(s):	Silva Reis, Willy Arthur ^[1] ; de Barros, Leliane Nunes ^[1] ; Delgado, Karina Valdivia ^[2] Total Authors: 3
Affiliation:	^[1] Univ Sao Paulo, Inst Math & Stat, R Matao 1010, Sao Paulo - Brazil ^[2] Univ Sao Paulo, Sch Arts Sci & Humanities, Av Arlindo Bettio 1000, Sao Paulo - Brazil Total Affiliations: 2
Document type:	Journal article
Source:	INTERNATIONAL JOURNAL OF APPROXIMATE REASONING; v. 105, p. 287-304, FEB 2019.
Web of Science Citations:	0
Abstract
Markov Decision Processes (MDPS) are commonly used to solve sequential decision problems. A less restrictive model is the Bounded-parameter MDP (BMDP) that allows: (i) the transition function to be expressed in terms of probability intervals and (ii) reasoning about a robust solution, i.e., the best solution under the worst model. In this paper, we propose the Robust Topological Policy Iteration (RTPI) algorithm which is a new policy iteration algorithm for infinite horizon BMDPs based on a partition of the state space. The empirical results show that the more structured the domain, the better is the performance of RTPI. (C) 2018 Elsevier Inc. All rights reserved. (AU)

FAPESP's process:	15/01587-0 - Storage, modeling and analysis of dynamical systems for e-Science applications
Grantee:	João Eduardo Ferreira
Support Opportunities:	Research Grants - eScience and Data Science Program - Thematic Grants