Advanced search
Start date
Betweenand


Survivability in Lambda Grids by means of Ant Colony Optimization

Author(s):
Pavani, Gustavo Sousa ; Frederic, Andre Ricardo ; Ahmed, T ; Festor, O ; Ghamri-Doudane, Y ; Kang, JM ; Schaeffer-Filho, AE ; Lahmadi, A ; Madeira, E
Total Authors: 9
Document type: Journal article
Source: 2021 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2021); v. N/A, p. 7-pg., 2021-01-01.
Abstract

Meta-scheduling in lambda grids is often a complex task because it typically comprises the discovery, monitoring, co-allocation, and orchestration of networking and computing resources. The support of advance reservations typically improves the performance of the lambda grid, but it also turns the meta-scheduling process much more complicated. All those mechanisms should deal with failures that may happen in the optical network or the computing infrastructure. Therefore, in this work, we propose a survivable, distributed grid meta-scheduler based on an Ant Colony Optimization (ACO) algorithm. By using restoration as the recovery mechanism, resilience against link, network node, and server node failure can be achieved. We evaluated the restorability for different combinations of meta- and local scheduling policies, and resource co-allocation algorithms under single link or single node failures. Besides, we assessed some of the parameters that may influence the restorability against server node failures, where the affected jobs are rescheduled to the remaining nodes of the grid. The results demonstrated that the ACO algorithm is capable of recovering near 100% of the jobs affected by link or server node failures for many of the combinations of meta- and local scheduling policies presented for the Server First-Relaxed (SF-R) and Network First (NF) co-allocation algorithms. (AU)

FAPESP's process: 15/24341-7 - New strategies to confront with the threat of capacity exhaustion
Grantee:Helio Waldman
Support Opportunities: Research Projects - Thematic Grants