A Method for the Online Construction of the Set of States of a Markov Decision Process Using Answer Set Programming

Texto completo
Autor(es):	Ferreira, Leonardo Anjoletto ; Bianchi, Reinaldo A. C. ; Santos, Paulo E. ; Lopez de Mantaras, Ramon ; Mouhoub, M ; Sadaoui, S ; Mohamed, OA ; Ali, M Número total de Autores: 8
Tipo de documento:	Artigo Científico
Fonte:	RECENT TRENDS AND FUTURE TECHNOLOGY IN APPLIED INTELLIGENCE, IEA/AIE 2018; v. 10868, p. 13-pg., 2018-01-01.
Resumo
Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named Online ASP for MDP (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process. (AU)

Processo FAPESP:	16/18792-9 - Descrição, representação e solução de jogos espaciais
Beneficiário:	Paulo Eduardo Santos
Modalidade de apoio:	Auxílio à Pesquisa - Parceria para Inovação Tecnológica - PITE