Abstract
Markov Decision Processes (MDPs) are widely used to solve sequential decision-making problems. The most commonly used performance criterion to find a solution in this type of problem is to minimize the expected total cost. However, this approach does not take into account the cost variability (i.e., fluctuations around the mean), which can significantly affect the policy performance. MDP…