Custom heterogeneous hardware acceleration for high-performance computing applicat...
Compilation techniques to optimize the memory subsystem access
Custom heterogeneous hardware acceleration for high-performance computing applicat...
Grant number: | 15/12187-3 |
Support Opportunities: | Scholarships abroad - Research Internship - Master's degree |
Start date: | September 01, 2015 |
End date: | February 29, 2016 |
Field of knowledge: | Physical Sciences and Mathematics - Computer Science - Computer Systems |
Principal Investigator: | Guido Costa Souza de Araújo |
Grantee: | Martin Ichilevici de Oliveira |
Supervisor: | José Nelson Amaral |
Host Institution: | Instituto de Computação (IC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil |
Institution abroad: | University of Alberta, Canada |
Associated to the scholarship: | 14/15523-1 - Memory allocation and balancing techniques on NUMA machines, BP.MS |
Abstract The NUMA model (Non-Uniform Memory Architecture) has allowed a considerable increase in the scalability of parallel architectures. However, many of today's computer systems do not consider the different latencies between local and remote memory accesses, which can lead to large performance losses. Also, they are usually agnostic to congestion effects caused by overloading some memory controllers. Programmer control, although possible, is costly and error prone, so an automatic memory distribution management system is critical. This project aims to implement a model that could control, independently of the programmer, the distribution between a program's local and remote memories. This model will be responsible to determine the program's memory access patterns, measure the variations on such pattern and distribute memory pages between different nodes so as to reduce the program's average latency. At Unicamp, during the main project research, a heuristic for data placement has been developed revealing good initial results, and performing at least as well as other popular page balancing techniques, if not better. Nonetheless, it is currently a static approach that depends on programmer's intervention.To address this issue, compiler techniques will be used. These will be responsible for understanding the program's memory access memory pattern, which pages are shared between threads and determining a good page distribution between nodes. Moreover, it could also support page migrations.With this model's implementation, we intend to improve the performance of programs that make heavy use of memory by promoting page distribution between nodes, thus reducing congestion, and by placing memory pages next to nodes that are operating on them, thus reducing the average latency. It is expectedthat the model will be flexible and easy to use, since it will not require user intervention. (AU) | |
News published in Agência FAPESP Newsletter about the scholarship: | |
More itemsLess items | |
TITULO | |
Articles published in other media outlets ( ): | |
More itemsLess items | |
VEICULO: TITULO (DATA) | |
VEICULO: TITULO (DATA) | |