Predição de desempenho de aplicações executadas em GPUs usando um modelo analítico simples e técnicas de aprendizado de máquina

Marcos Tulio Amarís González

Full text
Author(s):	Marcos Tulio Amarís González Total Authors: 1
Document type:	Doctoral Thesis
Press:	São Paulo.
Institution:	Universidade de São Paulo (USP). Instituto de Matemática e Estatística (IME/SBI)
Defense date:	2018-06-25
Examining board members:	Alfredo Goldman Vel Lejbman; Arnaud Legrand; Philippe Olivier Alexandre Navaux; Liria Matsumoto Sato; Hermes Senger
Advisor:	Alfredo Goldman Vel Lejbman; Raphael Yokoingawa de Camargo
Abstract
The parallel and distributed platforms of High Performance Computing available today have became more and more heterogeneous (CPUs, GPUs, FPGAs, etc). Graphics Processing Units (GPU) are specialized co-processor to accelerate and improve the performance of parallel vector operations. GPUs have a high degree of parallelism and can execute thousands or millions of threads concurrently and hide the latency of the scheduler. GPUs have a deep hierarchical memory of different types as well as different configurations of these memories. Performance prediction of applications executed on these devices is a great challenge and is essential for the efficient use of resources in machines with these co-processors. There are different approaches for these predictions, such as analytical modeling and machine learning techniques. In this thesis, we present an analysis and characterization of the performance of applications executed on GPUs. We propose a simple and intuitive BSP-based model for predicting the CUDA application execution times on different GPUs. The model is based on the number of computations and memory accesses of the GPU, with additional information on cache usage obtained from profiling. We also compare three different Machine Learning (ML) approaches: Linear Regression, Support Vector Machines and Random Forests with BSP-based analytical model. This comparison is made in two contexts, first, data input or features for ML techniques were the same than analytical model, and, second, using a process of feature extraction, using correlation analysis and hierarchical clustering. We show that GPU applications that scale regularly can be predicted with simple analytical models, and an adjusting parameter. This parameter can be used to predict these applications in other GPUs. We also demonstrate that ML approaches provide reasonable predictions for different cases and ML techniques required no detailed knowledge of application code, hardware characteristics or explicit modeling. Consequently, whenever a large data set with information about similar applications are available or it can be created, ML techniques can be useful for deploying automated on-line performance prediction for scheduling applications on heterogeneous architectures with GPUs. (AU)

FAPESP's process:	12/23300-7 - Bulk Synchronous Parallel Model on Graphic Processing Units
Grantee:	Marcos Tulio Amaris González
Support Opportunities:	Scholarships in Brazil - Doctorate

Short URL