GUBS criterion: Arbitrary trade-offs between cost and probability-to-goal in stochastic planning based on Expected Utility Theory

Crispino, Gabriel Nunes; Freire, Valdinei; Delgado, Karina Valdivia

Full text
Author(s):	Crispino, Gabriel Nunes ; Freire, Valdinei ; Delgado, Karina Valdivia Total Authors: 3
Document type:	Journal article
Source:	ARTIFICIAL INTELLIGENCE; v. 316, p. 45-pg., 2023-03-01.
Abstract
Stochastic Shortest Path MDPs (SSP-MDPs) are used to model probabilistic sequential decision problems where the objective is to minimize the expected accumulated cost to goal. However, in the presence of dead-ends, the conventional criterion for SSP-MDPs, which minimizes the expected accumulated cost, can become ill-defined. Lexicographic criteria can solve this by preferring policies that reach the goal with the highest possible probability. Other criteria can instead make a trade-off between some cost measure and probability-to-goal. However, both of these approaches can lead to policies that might not represent the choice of a real decision-maker. In this work, we propose the GUBS criterion to address these problems. GUBS combines goal prioritization over histories with Expected Utility Theory and is the only criterion between all criteria analyzed that not only allows for a trade-off between a large accumulated cost and a small loss in probability-to-goal, but also guarantees arbitrary trade-offs that can be tuned from its parameters without previous knowledge of the problem being solved. We also propose eGUBS, which is a particular case of GUBS when the exponential utility function is used, and two algorithms for optimally solving these problems: eGUBS-VI, a VI-based algorithm; and eGUBS-AO, a heuristic search algorithm. Results indicate that, when there is a good heuristic function available or when the state space is too large, eGUBS-AO can perform better than eGUBS-VI by doing an efficient search. In other cases, eGUBS-VI's simpler approach might have better results. (c) 2022 Elsevier B.V. All rights reserved. (AU)

FAPESP's process:	18/11236-9 - Markov decision process and risk
Grantee:	Karina Valdivia Delgado
Support Opportunities:	Regular Research Grants


FAPESP's process:	19/07665-4 - Center for Artificial Intelligence
Grantee:	Fabio Gagliardi Cozman
Support Opportunities:	Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program

Short URL