Heuristically-Accelerated Multiagent Reinforcement Learning

Bianchi, Reinaldo A. C.; Martins, Murilo F.; Ribeiro, Carlos H. C.; Costa, Anna H. R.

Full text
Author(s):	Bianchi, Reinaldo A. C. ^[1] ; Martins, Murilo F. ^[1] ; Ribeiro, Carlos H. C. ^[2] ; Costa, Anna H. R. ^[3] Total Authors: 4
Affiliation:	^[1] Ctr Univ FEI, Dept Elect Engn, BR-09850901 Sao Bernardo Do Campo - Brazil ^[2] Technol Inst Aeronaut, Div Comp Sci, Sao Jose Dos Campos - Brazil ^[3] Univ Sao Paulo, Escola Politecn, Sao Paulo - Brazil Total Affiliations: 3
Document type:	Journal article
Source:	IEEE TRANSACTIONS ON CYBERNETICS; v. 44, n. 2, p. 252-265, FEB 2014.
Web of Science Citations:	21
Abstract
This paper presents a novel class of algorithms, called Heuristically-Accelerated Multiagent Reinforcement Learning (HAMRL), which allows the use of heuristics to speed up well-known multiagent reinforcement learning (RL) algorithms such as the Minimax-Q. Such HAMRL algorithms are characterized by a heuristic function, which suggests the selection of particular actions over others. This function represents an initial action selection policy, which can be handcrafted, extracted from previous experience in distinct domains, or learnt from observation. To validate the proposal, a thorough theoretical analysis proving the convergence of four algorithms from the HAMRL class (HAMMQ, HAMQ(lambda), HAMQS, and HAMS) is presented. In addition, a comprehensive systematical evaluation was conducted in two distinct adversarial domains. The results show that even the most straightforward heuristics can produce virtually optimal action selection policies in much fewer episodes, significantly improving the performance of the HAMRL over vanilla RL algorithms. (AU)

FAPESP's process:	11/19280-8 - CogBot: integrating perceptual information and semantic knowledge in cognitive robotics
Grantee:	Anna Helena Reali Costa
Support Opportunities:	Regular Research Grants


FAPESP's process:	12/04089-3 - Collaborative spatial reasoning for a multi-robot system
Grantee:	Paulo Eduardo Santos
Support Opportunities:	Regular Research Grants


FAPESP's process:	11/17610-0 - Monitoring and control of dynamic systems subject to faults
Grantee:	Roberto Kawakami Harrop Galvão
Support Opportunities:	Research Projects - Thematic Grants


FAPESP's process:	12/12640-1 - Learning by Demonstration in Cooperative Human-robot Interaction Scenarios
Grantee:	Murilo Fernandes Martins
Support Opportunities:	Scholarships in Brazil - Post-Doctoral

Short URL