Evaluating execution time predictions on GPU kernels using an analytical model and machine learning techniques

Amaris, Marcos; Camargo, Raphael; Cordeiro, Daniel; Goldman, Alfredo; Trystram, Denis

Full text
Author(s):	Amaris, Marcos ; Camargo, Raphael ; Cordeiro, Daniel ; Goldman, Alfredo ; Trystram, Denis Total Authors: 5
Document type:	Journal article
Source:	JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING; v. 171, p. 13-pg., 2023-01-01.
Abstract
Predicting the performance of applications executed on GPUs is a great challenge and is essential for efficient job schedulers. There are different approaches to do this, namely analytical modeling and machine learning (ML) techniques. Machine learning requires large training sets and reliable features, nevertheless it can capture the interactions between architecture and software without manual intervention. In this paper, we compared a BSP-based analytical model to predict the time of execution of kernels executed over GPUs. The comparison was made using three different ML techniques. The analytical model is based on the number of computations and memory accesses of the GPU, with additional information on cache usage obtained from profiling. The ML techniques Linear Regression, Support Vector Machine, and Random Forest were evaluated over two scenarios: first, data input or features for ML techniques were the same as the analytical model and, second, using a process of feature extraction, which used correlation analysis and hierarchical clustering. Our experiments were conducted with 20 CUDA kernels, 11 of which belonged to 6 real-world applications of the Rodinia benchmark suite, and the other were classical matrix-vector applications commonly used for benchmarking. We collected data over 9 NVIDIA GPUs in different machines. We show that the analytical model performs better at predicting when applications scale regularly. For the analytical model a single parameter lambda is capable of adjusting the predictions, minimizing the complex analysis in the applications. We show also that ML techniques obtained high accuracy when a process of feature extraction is implemented. Sets of 5 and 10 features were tested in two different ways, for unknown GPUs and for unknown Kernels. For ML experiments with a process of feature extractions, we got errors around 1.54% and 2.71%, for unknown GPUs and for unknown Kernels, respectively. (c) 2022 Elsevier Inc. All rights reserved. (AU)

FAPESP's process:	19/26702-8 - Trends on high performance computing, from resource management to new computer architectures
Grantee:	Alfredo Goldman vel Lejbman
Support Opportunities:	Research Projects - Thematic Grants


FAPESP's process:	15/19399-6 - Machine learning to predict performance and running time of heterogeneous applications with uncertain data input
Grantee:	Marcos Tulio Amaris González
Support Opportunities:	Scholarships abroad - Research Internship - Doctorate


FAPESP's process:	21/06867-2 - Applications of scheduling theory to optimize green energy usage in cloud computing platforms
Grantee:	Daniel de Angelis Cordeiro
Support Opportunities:	Regular Research Grants


FAPESP's process:	12/23300-7 - Bulk Synchronous Parallel Model on Graphic Processing Units
Grantee:	Marcos Tulio Amaris González
Support Opportunities:	Scholarships in Brazil - Doctorate

Short URL