A Unified Framework for Average Reward Criterion and Risk

Silva Reis, Willy Arthur; Delgado, Karina Valdivia; Freire, Valdinei

Texto completo
Autor(es):	Silva Reis, Willy Arthur ; Delgado, Karina Valdivia ; Freire, Valdinei Número total de Autores: 3
Tipo de documento:	Artigo Científico
Fonte:	INTELLIGENT SYSTEMS, BRACIS 2024, PT I; v. 15412, p. 15-pg., 2025-01-01.
Resumo
The average reward criterion is used to solve infinite-horizon MDPs. This risk-neutral criterion depends on the stochastic process in the limit and can use (i) the accumulated reward at infinity, which considers sequences of states of size h = infinity, or (ii) the steady state distribution of the MDP (i.e., the probability that the system is in each state in the long term), which considers sequences of states of size h = 1. In many situations, it is desirable to consider risk during the process at each stage, which can be achieved with the average reward criterion using a utility function or a risk measure such as VaR and CVaR. The objective of this work is to propose a mathematical framework that allows a unified treatment of the existing literature using average reward and risk, including works that use exponential utility functions and CVaR, as well as to include interpretations with 1 <= h <= infinity not present in the literature. These new interpretations allow differentiating policies that may not be distinguished from existing criteria. A numerical example shows the behaviors of the criteria considering this new framework. (AU)

Processo FAPESP:	18/11236-9 - Processos de decisão Markovianos e risco
Beneficiário:	Karina Valdivia Delgado
Modalidade de apoio:	Auxílio à Pesquisa - Regular


Processo FAPESP:	19/07665-4 - Centro de Inteligência Artificial
Beneficiário:	Fabio Gagliardi Cozman
Modalidade de apoio:	Auxílio à Pesquisa - Programa eScience e Data Science - Centros de Pesquisa em Engenharia

URL curto