A Unified Framework for Average Reward Criterion and Risk

Silva Reis, Willy Arthur; Delgado, Karina Valdivia; Freire, Valdinei

Full text
Author(s):	Silva Reis, Willy Arthur ; Delgado, Karina Valdivia ; Freire, Valdinei Total Authors: 3
Document type:	Journal article
Source:	INTELLIGENT SYSTEMS, BRACIS 2024, PT I; v. 15412, p. 15-pg., 2025-01-01.
Abstract
The average reward criterion is used to solve infinite-horizon MDPs. This risk-neutral criterion depends on the stochastic process in the limit and can use (i) the accumulated reward at infinity, which considers sequences of states of size h = infinity, or (ii) the steady state distribution of the MDP (i.e., the probability that the system is in each state in the long term), which considers sequences of states of size h = 1. In many situations, it is desirable to consider risk during the process at each stage, which can be achieved with the average reward criterion using a utility function or a risk measure such as VaR and CVaR. The objective of this work is to propose a mathematical framework that allows a unified treatment of the existing literature using average reward and risk, including works that use exponential utility functions and CVaR, as well as to include interpretations with 1 <= h <= infinity not present in the literature. These new interpretations allow differentiating policies that may not be distinguished from existing criteria. A numerical example shows the behaviors of the criteria considering this new framework. (AU)

FAPESP's process:	18/11236-9 - Markov decision process and risk
Grantee:	Karina Valdivia Delgado
Support Opportunities:	Regular Research Grants


FAPESP's process:	19/07665-4 - Center for Artificial Intelligence
Grantee:	Fabio Gagliardi Cozman
Support Opportunities:	Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program

Short URL