Advanced search
Start date
Betweenand

Providing fault tolerance for OpenMP target-based applications

Grant number: 21/09355-2
Support Opportunities:Scholarships in Brazil - Doctorate
Start date: October 01, 2021
End date: August 08, 2025
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Guido Costa Souza de Araújo
Grantee:Pedro Henrique di Francia Rosso
Host Institution: Instituto de Computação (IC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil
Associated research grant:13/08293-7 - CCES - Center for Computational Engineering and Sciences, AP.CEPID

Abstract

High Performance Computing (CAD) environments are often being used in the scientific field. OmpCluster aims to facilitate the development of scientific applications in such environments. Since, there is a large computational power involved, failures are expected to occur more frequently. For this reason, Fault Tolerance (TF) is a constant concern within OmpCluster. With part of the system already fault tolerant, with regard to the MPI (Message Passing Interface) standard used within OmpCluster, this project aims to extend and provide TF in the context of OpenMP, the tool on which OmpCluster is developed. The main goal is to provide fault tolerance at points that are missing or that are current limitations for OmpCluster. It is expected that at the end, the entire system involving OmpCluster will be able to deal with failures automatically, without needing interaction from the application developer, except for the desired settings. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
ROSSO, PEDRO HENRIQUE; PETRICA, LUCIAN; LISA, NUSRAT JAHAN; PEREIRA, MARCIO; RIGO, SANDRO; YVIQUEL, HERVE; BONATO, VANDERLEI; FRANCESQUINI, EMILIO; ARAUJO, GUIDO. Integrating Multi-FPGA Acceleration to OpenMP Distributed Computing. ADVANCING OPENMP FOR FUTURE ACCELERATORS, IWOMP 2024, v. 15195, p. 15-pg., . (21/09355-2)