Advanced search
Start date
Betweenand


A Unified Framework for Average Reward Criterion and Risk

Full text
Author(s):
Silva Reis, Willy Arthur ; Delgado, Karina Valdivia ; Freire, Valdinei
Total Authors: 3
Document type: Journal article
Source: INTELLIGENT SYSTEMS, BRACIS 2024, PT I; v. 15412, p. 15-pg., 2025-01-01.
Abstract

The average reward criterion is used to solve infinite-horizon MDPs. This risk-neutral criterion depends on the stochastic process in the limit and can use (i) the accumulated reward at infinity, which considers sequences of states of size h = infinity, or (ii) the steady state distribution of the MDP (i.e., the probability that the system is in each state in the long term), which considers sequences of states of size h = 1. In many situations, it is desirable to consider risk during the process at each stage, which can be achieved with the average reward criterion using a utility function or a risk measure such as VaR and CVaR. The objective of this work is to propose a mathematical framework that allows a unified treatment of the existing literature using average reward and risk, including works that use exponential utility functions and CVaR, as well as to include interpretations with 1 <= h <= infinity not present in the literature. These new interpretations allow differentiating policies that may not be distinguished from existing criteria. A numerical example shows the behaviors of the criteria considering this new framework. (AU)

FAPESP's process: 18/11236-9 - Markov decision process and risk
Grantee:Karina Valdivia Delgado
Support Opportunities: Regular Research Grants
FAPESP's process: 19/07665-4 - Center for Artificial Intelligence
Grantee:Fabio Gagliardi Cozman
Support Opportunities: Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program