Advanced search
Start date
Betweenand


Automatic loop parallelization for multicore architectures

Full text
Author(s):
Cristianno Martins Vieira
Total Authors: 1
Document type: Master's Dissertation
Press: Campinas, SP.
Institution: Universidade Estadual de Campinas (UNICAMP). Instituto de Computação
Defense date:
Examining board members:
Sandro Rigo; Ricardo Ribeiro dos Santos; Guido Costa Souza de Araújo
Advisor: Sandro Rigo
Abstract

Although many programs present a regular form of parallelism, which can be expressed as parallel loops, many important examples do not. Loop skewing is a transformation that reorganizes the iteration space of loops to make it possible to expose the implicit parallelism through parallel loops. In general, as a consequence of the complexity in modifying the iteration space of loops, and possible problems caused by such changes - such as the possibility of increasing the miss rate in caches -, they are not widely used. In this work, the loop skewing transformation was implemented on GCC's C compiler (GNU Compiler Collection), allowing programmer's assistance. Graphite provides us a basis for implementation of the optimization, just representing it as an a_ne transformation on a multidimensional mathematical object called polytope. We show, through a detailed study about the mathematical model called polytope model, that for a very restricted loop structure - perfectly nested, with limits and memory accesses described by a_ne functions - could be represented as polytopes, and transformations applied to these would be carried by the code generated from these polytope. Thus, any transformation that could be structured as an a_ne transformation on a polytope, could be added. We also show, by means of performance analysis, that this type of transformation is feasible and, despite some limitations imposed by the still under development GCC's infrastructure for auto-parallelization, fairly increases the performance of some applications compiled with it - we achived a maximum of about 115% using four threads with one of the applications. We also veriéd the impact of using manually parallelized programs on this platform, and achieved a maximum gain of 11% in these cases, showing that even parallel applications may have implicit parallelism (AU)