Busca avançada
Ano de início
Entree


Towards an Optimized Heterogeneous Distributed Task Scheduler in OpenMP Cluster

Texto completo
Autor(es):
Neveu, Remy ; Ceccato, Rodrigo ; Leite, Gustavo ; Araujo, Guido ; Diaz, Jose M. Monsalve ; Yviquel, Herve
Número total de Autores: 6
Tipo de documento: Artigo Científico
Fonte: PROCEEDINGS OF SC24-W: WORKSHOPS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS; v. N/A, p. 10-pg., 2024-01-01.
Resumo

This paper addresses the challenges of optimizing task scheduling for a distributed, task-based execution model in OpenMP for cluster computing environments. Traditional OpenMP implementations are primarily designed for shared-memory parallelism and offer limited control over task scheduling. However, improved scheduling mechanisms are critical to achieving performance and portability in distributed and heterogeneous environments. OpenMP Cluster (OMPC) was introduced to overcome these limitations, extending OpenMP with the Heterogeneous Earliest Finish Time (HEFT) task scheduling algorithm tailored for large-scale systems. To improve scheduling and enable better system utilization, the runtime system must resolve challenges such as changes in the application balance, amount of parallelism, and varying communication latencies. This work presents three key contributions: first, the refactoring of the OMPC runtime to unify task scheduling across devices and hosts; second, the optimization of the HEFT-based scheduling algorithm to ensure efficient task execution in distributed environments; and third, an extensive evaluation of Work Stealing and HEFT scheduling mechanisms in real-world clusters. While the HEFT implementation in OMPC is not fully optimized, this work provides a significant step toward improving distributed task scheduling in cluster computing, offering insights and incremental advancements that support the development of scalable and high-performance applications. Results show improvements of up to 24% in scheduling time while opening up to more extensions in the scheduling methods. (AU)

Processo FAPESP: 19/17874-0 - EMU concedido no Proc. 2013/08293-7, KAHUNA upgrade - HPE Apollo Gen10 Supercomputer
Beneficiário:Munir Salomao Skaf
Modalidade de apoio: Auxílio à Pesquisa - Programa Equipamentos Multiusuários
Processo FAPESP: 13/08293-7 - CECC - Centro de Engenharia e Ciências Computacionais
Beneficiário:Munir Salomao Skaf
Modalidade de apoio: Auxílio à Pesquisa - Centros de Pesquisa, Inovação e Difusão - CEPIDs