Stochastic Abstract Policies: Generalizing Knowledge to Improve Reinforcement Learning

Koga, Marcelo L.; Freire, Valdinei; Costa, Anna H. R.

Texto completo
Autor(es):	Koga, Marcelo L. ^[1] ; Freire, Valdinei ^[2] ; Costa, Anna H. R. ^[1] Número total de Autores: 3
Afiliação do(s) autor(es):	^[1] Univ Sao Paulo, Escola Politecn, BR-05508970 Sao Paulo - Brazil ^[2] Univ Sao Paulo, Escola Artes Ciencias & Humanidades, BR-05508970 Sao Paulo - Brazil Número total de Afiliações: 2
Tipo de documento:	Artigo Científico
Fonte:	IEEE TRANSACTIONS ON CYBERNETICS; v. 45, n. 1, p. 77-88, JAN 2015.
Citações Web of Science:	15
Resumo
Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task. (AU)

Processo FAPESP:	12/02190-9 - Transferência de Conhecimento entre Tarefas no Aprendizado por Reforço
Beneficiário:	Marcelo Li Koga
Modalidade de apoio:	Bolsas no Brasil - Mestrado


Processo FAPESP:	11/19280-8 - CogBot: integrando informação perceptual e conhecimento semântico na robótica cognitiva
Beneficiário:	Anna Helena Reali Costa
Modalidade de apoio:	Auxílio à Pesquisa - Regular

URL curto