Memory allocation and balancing techniques for NUMA machines
Compilation techniques to optimize the memory subsystem access
Implementation of a Prediction and Shared Buffer Management Algorithm in Programma...
![]() | |
Author(s): |
Martin Ichilevici de Oliveira
Total Authors: 1
|
Document type: | Master's Dissertation |
Press: | Campinas, SP. |
Institution: | Universidade Estadual de Campinas (UNICAMP). Instituto de Computação |
Defense date: | 2016-08-18 |
Examining board members: |
Guido Costa Souza de Araújo;
Lucas Francisco Wanner;
Fernando Magno Quintão Pereira
|
Advisor: | Alexandro José Baldassin; Guido Costa Souza de Araújo; José Nelson Amaral |
Abstract | |
In a NUMA machine, a program¿s execution time can be significantly impacted by how data and tasks are distributed between nodes. Thus, correctly assigning threads and memory pages is paramount. The correct assignment should match the demand for remote data transfers with the available communication bandwidth and memory controller capacity. Such assignment typically requires dealing with four simultaneous goals: (a) keep threads close to the memory pages they access; (b) evenly distribute the workload among nodes; (c) maintain memory demand below memory controllers¿ bandwidth; and (d) reassign threads and pages to follow changes in the memory access pattern of the program. However, most solutions to this problem address only a subset of these goals, mainly because they seek to avoid complex solutions or expensive implementation overheads. This work proposes PTB, a heuristic-based algorithm that simultaneously allocates (P)ages, (T)hreads and (B)andwidth to each node of a NUMA architecture. In contrast to alternative approaches, PTB integrated solution seeks both to uniformly distribute workload and to limit memory demand to the controllers¿ bandwidth while also addressing asymmetry issues found in the communication paths of modern NUMA architectures. Experimental results using Parsec, NAS and Metis benchmarks reveal that PTB produces geometric mean speedups of 1.16x when compared to Linux¿s default scheduler. In particular, for a number of programs, PTB speedups ranged from 1.6x to 2x while Linux¿s automatic NUMA balancing either stayed below 1.2x. or resulted in slowdowns (AU) | |
FAPESP's process: | 14/15523-1 - Memory allocation and balancing techniques on NUMA machines |
Grantee: | Martin Ichilevici de Oliveira |
Support Opportunities: | Scholarships in Brazil - Master |