Advanced search
Start date
Betweenand


Hatch: Self-distributing systems for data centers

Full text
Author(s):
Rodrigues-Filho, Roberto ; Porter, Barry
Total Authors: 2
Document type: Journal article
Source: FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE; v. 132, p. 13-pg., 2022-02-26.
Abstract

Designing and maintaining distributed systems remains highly challenging: there is a high-dimensional design space of potential ways to distribute a system's sub-components over a large-scale infrastructure; and the deployment environment for a system tends to change in unforeseen ways over time. For engineers, this is a complex prediction problem to gauge which distributed design may best suit a given environment. We present the concept of self-distributing systems, in which any local system built using our framework can learn, at runtime, the most appropriate distributed design given its perceived operating conditions. Our concept abstracts distribution of a system's sub-components to a list of simple actions in a reward matrix of distributed design alternatives to be used by reinforcement learning algorithms. By doing this, we enable software to experiment, in a live production environment, with different ways in which to distribute its software modules by placing them in different hosts throughout the system's infrastructure. We implement this concept in a framework we call Hatch, which has three major elements: (i) a transparent and generalized RPC layer that supports seamless relocation of any local component to a remote host during execution; (ii) a set of primitives, including relocation, replication and sharding, from which to create an action/reward matrix of possible distributed designs of a system; and (iii) a decentralized reinforcement learning approach to converge towards more optimal designs in real time. Using an example of a self-distributing web-serving infrastructure, Hatch is able to autonomously select the most suitable distributed design from among approximate to 700,000 alternatives in about 5 min. (c) 2022 Elsevier B.V. All rights reserved. (AU)

FAPESP's process: 14/50937-1 - INCT 2014: on the Internet of the Future
Grantee:Fabio Kon
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 20/07193-2 - Autonomic composition of software for smart cities
Grantee:Roberto Vito Rodrigues Filho
Support Opportunities: Scholarships in Brazil - Post-Doctoral