Abstract
High Performance Computing (CAD) environments are often being used in the scientific field. OmpCluster aims to facilitate the development of scientific applications in such environments. Since, there is a large computational power involved, failures are expected to occur more frequently. For this reason, Fault Tolerance (TF) is a constant concern within OmpCluster. With part of the system…