PTB: an integrated page, thread and bandwidth allocation approach for NUMA architectures = PTB: uma abordagem integrada de alocação de páginas, threads e banda para arquiteturas NUMA

Martin Ichilevici de Oliveira

Full text
Author(s):	Martin Ichilevici de Oliveira Total Authors: 1
Document type:	Master's Dissertation
Press:	Campinas, SP.
Institution:	Universidade Estadual de Campinas (UNICAMP). Instituto de Computação
Defense date:	2016-08-18
Examining board members:	Guido Costa Souza de Araújo; Lucas Francisco Wanner; Fernando Magno Quintão Pereira
Advisor:	Alexandro José Baldassin; Guido Costa Souza de Araújo; José Nelson Amaral
Abstract
In a NUMA machine, a program¿s execution time can be significantly impacted by how data and tasks are distributed between nodes. Thus, correctly assigning threads and memory pages is paramount. The correct assignment should match the demand for remote data transfers with the available communication bandwidth and memory controller capacity. Such assignment typically requires dealing with four simultaneous goals: (a) keep threads close to the memory pages they access; (b) evenly distribute the workload among nodes; (c) maintain memory demand below memory controllers¿ bandwidth; and (d) reassign threads and pages to follow changes in the memory access pattern of the program. However, most solutions to this problem address only a subset of these goals, mainly because they seek to avoid complex solutions or expensive implementation overheads. This work proposes PTB, a heuristic-based algorithm that simultaneously allocates (P)ages, (T)hreads and (B)andwidth to each node of a NUMA architecture. In contrast to alternative approaches, PTB integrated solution seeks both to uniformly distribute workload and to limit memory demand to the controllers¿ bandwidth while also addressing asymmetry issues found in the communication paths of modern NUMA architectures. Experimental results using Parsec, NAS and Metis benchmarks reveal that PTB produces geometric mean speedups of 1.16x when compared to Linux¿s default scheduler. In particular, for a number of programs, PTB speedups ranged from 1.6x to 2x while Linux¿s automatic NUMA balancing either stayed below 1.2x. or resulted in slowdowns (AU)

FAPESP's process:	14/15523-1 - Memory allocation and balancing techniques on NUMA machines
Grantee:	Martin Ichilevici de Oliveira
Support Opportunities:	Scholarships in Brazil - Master

Short URL