Evaluating execution time predictions on GPU kernels using an analytical model and machine learning techniques

Amaris, Marcos; Camargo, Raphael; Cordeiro, Daniel; Goldman, Alfredo; Trystram, Denis

Texto completo
Autor(es):	Amaris, Marcos ; Camargo, Raphael ; Cordeiro, Daniel ; Goldman, Alfredo ; Trystram, Denis Número total de Autores: 5
Tipo de documento:	Artigo Científico
Fonte:	JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING; v. 171, p. 13-pg., 2023-01-01.
Resumo
Predicting the performance of applications executed on GPUs is a great challenge and is essential for efficient job schedulers. There are different approaches to do this, namely analytical modeling and machine learning (ML) techniques. Machine learning requires large training sets and reliable features, nevertheless it can capture the interactions between architecture and software without manual intervention. In this paper, we compared a BSP-based analytical model to predict the time of execution of kernels executed over GPUs. The comparison was made using three different ML techniques. The analytical model is based on the number of computations and memory accesses of the GPU, with additional information on cache usage obtained from profiling. The ML techniques Linear Regression, Support Vector Machine, and Random Forest were evaluated over two scenarios: first, data input or features for ML techniques were the same as the analytical model and, second, using a process of feature extraction, which used correlation analysis and hierarchical clustering. Our experiments were conducted with 20 CUDA kernels, 11 of which belonged to 6 real-world applications of the Rodinia benchmark suite, and the other were classical matrix-vector applications commonly used for benchmarking. We collected data over 9 NVIDIA GPUs in different machines. We show that the analytical model performs better at predicting when applications scale regularly. For the analytical model a single parameter lambda is capable of adjusting the predictions, minimizing the complex analysis in the applications. We show also that ML techniques obtained high accuracy when a process of feature extraction is implemented. Sets of 5 and 10 features were tested in two different ways, for unknown GPUs and for unknown Kernels. For ML experiments with a process of feature extractions, we got errors around 1.54% and 2.71%, for unknown GPUs and for unknown Kernels, respectively. (c) 2022 Elsevier Inc. All rights reserved. (AU)

Processo FAPESP:	19/26702-8 - Tendências em computação de alto desempenho, do gerenciamento de recursos a novas arquiteturas de computadores
Beneficiário:	Alfredo Goldman vel Lejbman
Modalidade de apoio:	Auxílio à Pesquisa - Temático


Processo FAPESP:	15/19399-6 - Aprendizagem Automática para predizer desempenho e tempo de corrida de aplicações heterogêneas com dados de entrada incertos
Beneficiário:	Marcos Tulio Amaris González
Modalidade de apoio:	Bolsas no Exterior - Estágio de Pesquisa - Doutorado


Processo FAPESP:	21/06867-2 - Aplicações de teoria do escalonamento para otimizar o uso de energia verde em plataformas de nuvens computacionais
Beneficiário:	Daniel de Angelis Cordeiro
Modalidade de apoio:	Auxílio à Pesquisa - Regular


Processo FAPESP:	12/23300-7 - Modelo BSP em Placas Gráficas
Beneficiário:	Marcos Tulio Amaris González
Modalidade de apoio:	Bolsas no Brasil - Doutorado

URL curto