Robust topological policy iteration for infinite horizon bounded Markov Decision Processes

Texto completo
Autor(es):	Silva Reis, Willy Arthur ^[1] ; de Barros, Leliane Nunes ^[1] ; Delgado, Karina Valdivia ^[2] Número total de Autores: 3
Afiliação do(s) autor(es):	^[1] Univ Sao Paulo, Inst Math & Stat, R Matao 1010, Sao Paulo - Brazil ^[2] Univ Sao Paulo, Sch Arts Sci & Humanities, Av Arlindo Bettio 1000, Sao Paulo - Brazil Número total de Afiliações: 2
Tipo de documento:	Artigo Científico
Fonte:	INTERNATIONAL JOURNAL OF APPROXIMATE REASONING; v. 105, p. 287-304, FEB 2019.
Citações Web of Science:	0
Resumo
Markov Decision Processes (MDPS) are commonly used to solve sequential decision problems. A less restrictive model is the Bounded-parameter MDP (BMDP) that allows: (i) the transition function to be expressed in terms of probability intervals and (ii) reasoning about a robust solution, i.e., the best solution under the worst model. In this paper, we propose the Robust Topological Policy Iteration (RTPI) algorithm which is a new policy iteration algorithm for infinite horizon BMDPs based on a partition of the state space. The empirical results show that the more structured the domain, the better is the performance of RTPI. (C) 2018 Elsevier Inc. All rights reserved. (AU)

Processo FAPESP:	15/01587-0 - Armazenagem, modelagem e análise de sistemas dinâmicos para aplicações em e-Science
Beneficiário:	João Eduardo Ferreira
Modalidade de apoio:	Auxílio à Pesquisa - Programa eScience e Data Science - Temático