A Profile-Based AI-Assisted Dynamic Scheduling Approach for Heterogeneous Architectures

Geng, Tongsheng; Amaris, Marcos; Zuckerman, Stephane; Goldman, Alfredo; Gao, Guang R.; Gaudiot, Jean-Luc

Full text
Author(s):	Geng, Tongsheng ^[1] ; Amaris, Marcos ^[2] ; Zuckerman, Stephane ^[3] ; Goldman, Alfredo ^[4] ; Gao, Guang R. ^[5] ; Gaudiot, Jean-Luc ^[1] Total Authors: 6
Affiliation:	^[1] Univ Calif Irvine, Irvine, CA 92697 - USA ^[2] Fed Univ Para, Tucurui, Para - Brazil ^[3] CY Cergy Paris Univ, Lab ETIS, ENSEA, CNRS, UMR 8051, F-95000 Cergy - France ^[4] Univ Sao Paulo, Sao Paulo - Brazil ^[5] Univ Delaware, Delaware, OH - USA Total Affiliations: 5
Document type:	Journal article
Source:	INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING; v. 50, n. 1 AUG 2021.
Web of Science Citations:	0
Abstract
While heterogeneous architectures are increasing popular with High Performance Computing systems, their effectiveness depends on how efficient the scheduler is at allocating workloads onto appropriate computing devices and how communication and computation can be overlapped. With different types of resources integrated into one system, the complexity of the scheduler correspondingly increases. Moreover, for applications with varying problem sizes on different heterogeneous resources, the optimal scheduling approach may vary accordingly. Thus, we introduce a Profile-based AI-assisted Dynamic Scheduling approach to dynamically and adaptively adjust workloads and efficiently utilize heterogeneous resources. It combines online scheduling, application profile information, hardware mathematical modeling and offline machine learning estimation modeling to implement automatic application-device-specific scheduling for heterogeneous architectures. A hardware mathematical model provides coarse-grain computing resource selection while the profile information and offline machine learning model estimates the performance of a fine-grain workload, and an online scheduling approach dynamically and adaptively distributes the workload. Our scheduling approach is tested on control-regular applications, 2D and 3D Stencil kernels (based on a Jacobi Algorithm), and a data-irregular application, Sparse Matrix-Vector Multiplication, in an event-driven runtime system. Experimental results show that PDAWL is either on-par or far outperforms whichever yields the best results (CPU or GPU). (AU)

FAPESP's process:	12/23300-7 - Bulk Synchronous Parallel Model on Graphic Processing Units
Grantee:	Marcos Tulio Amaris González
Support Opportunities:	Scholarships in Brazil - Doctorate

Short URL