Advanced search
Start date
Betweenand

Machine learning to predict performance and running time of heterogeneous applications with uncertain data input

Grant number: 15/19399-6
Support Opportunities:Scholarships abroad - Research Internship - Doctorate
Start date: November 01, 2015
End date: October 31, 2016
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Alfredo Goldman vel Lejbman
Grantee:Marcos Tulio Amaris González
Supervisor: Denis Trystram
Host Institution: Instituto de Matemática e Estatística (IME). Universidade de São Paulo (USP). São Paulo , SP, Brazil
Institution abroad: Université de Grenoble, France  
Associated to the scholarship:12/23300-7 - Bulk Synchronous Parallel Model on Graphic Processing Units, BP.DR

Abstract

Today, most computing platforms for HPC have heterogeneous hardware resources (CPUs, GPUs, storage, etc.) The most powerful supercomputers today have millions of those resources. In order to use all the computational power available, applications must be composed of multiple tasks that must use all available resources as efficiently as possible.The Job Management System (JMS) is the middleware responsible for distributing computing power to applications. A JMS must allocate resources to tasks in order to optimize the use of the available resources while guaranteeing good performance for all applications running on parallel. A promising way to achieve this is using performance prediction. However, in a scenario with millions of processors and large number of tasks it is very difficult to predict the performance of applications.This research project proposes the study of machine learning algorithms to support scheduling of tasks on large number of computational resources. We will apply learning techniques to devise novel scheduling algorithms that are capable of performing performance prediction from actual execution traces and developer's (uncertain) estimations.Using our previous results on GPU performance characterization and data collected from several JMSs, we will analyze the applicability of machine learning algorithms to predict running time of heterogeneous applications.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
AMARIS, MARCOS; CAMARGO, RAPHAEL; CORDEIRO, DANIEL; GOLDMAN, ALFREDO; TRYSTRAM, DENIS. Evaluating execution time predictions on GPU kernels using an analytical model and machine learning techniques. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, v. 171, p. 13-pg., . (19/26702-8, 15/19399-6, 21/06867-2, 12/23300-7)