Transferring knowledge as heuristics in reinforcement learning: A case-based approach

Bianchi, Reinaldo A. C.; Celiberto, Jr., Luiz A.; Santos, Paulo E.; Matsuura, Jackson P.; Lopez de Mantaras, Ramon

Texto completo
Autor(es):	Bianchi, Reinaldo A. C. ^[1] ; Celiberto, Jr., Luiz A. ^[2] ; Santos, Paulo E. ^[1] ; Matsuura, Jackson P. ^[3] ; Lopez de Mantaras, Ramon ^[4] Número total de Autores: 5
Afiliação do(s) autor(es):	^[1] Ctr Univ FEI, BR-09850901 Sao Paulo - Brazil ^[2] Univ Fed ABC UFABC, Ctr Engn Modelagem & Ciencias Sociais Aplicadas C, BR-09210580 Sao Paulo - Brazil ^[3] Technol Inst Aeronaut ITA, BR-12228900 Sao Paulo - Brazil ^[4] CSIC, IIIA Artificial Intelligence Res Inst, Spanish Natl Res Council, Bellaterra 08193, Catalonia - Spain Número total de Afiliações: 4
Tipo de documento:	Artigo Científico
Fonte:	ARTIFICIAL INTELLIGENCE; v. 226, p. 102-121, SEP 2015.
Citações Web of Science:	24
Resumo
The goal of this paper is to propose and analyse a transfer learning meta-algorithm that allows the implementation of distinct methods using heuristics to accelerate a Reinforcement Learning procedure in one domain (the target) that are obtained from another (simpler) domain (the source domain). This meta-algorithm works in three stages: first, it uses a Reinforcement Learning step to learn a task on the source domain, storing the knowledge thus obtained in a case base; second, it does an unsupervised mapping of the source-domain actions to the target-domain actions; and, third, the case base obtained in the first stage is used as heuristics to speed up the learning process in the target domain. A set of empirical evaluations were conducted in two target domains: the 3D mountain car (using a learned case base from a 2D simulation) and stability learning for a humanoid robot in the Robocup 3D Soccer Simulator (that uses knowledge learned from the Acrobot domain). The results attest that our transfer learning algorithm outperforms recent heuristically-accelerated reinforcement learning and transfer learning algorithms. (C) 2015 Elsevier B.V. All rights reserved. (AU)

Processo FAPESP:	11/19280-8 - CogBot: integrando informação perceptual e conhecimento semântico na robótica cognitiva
Beneficiário:	Anna Helena Reali Costa
Modalidade de apoio:	Auxílio à Pesquisa - Regular


Processo FAPESP:	12/04089-3 - Raciocínio espacial colaborativo para múltiplos robôs
Beneficiário:	Paulo Eduardo Santos
Modalidade de apoio:	Auxílio à Pesquisa - Regular


Processo FAPESP:	12/14010-5 - Transferência de Aprendizado para Robôs Heterogêneos
Beneficiário:	Luiz Antonio Celiberto Junior
Modalidade de apoio:	Bolsas no Brasil - Pós-Doutorado

URL curto