Mecanismo para execução especulativa de aplicações paralelizadas por técnicas DOPIPE usando replicação de estágios

André Oliveira Loureiro do Baixo

Full text
Author(s):	André Oliveira Loureiro do Baixo Total Authors: 1
Document type:	Master's Dissertation
Press:	Campinas, SP.
Institution:	Universidade Estadual de Campinas (UNICAMP). Instituto de Computação
Defense date:	2012-07-24
Examining board members:	Guido Costa Souza de Araújo; Mauricio Breternitz Junior; Rodolfo Jardim de Azevedo
Advisor:	Guido Costa Souza de Araújo
Abstract
Maximal utilization of cores in multicore architectures is key to realize the potential performance available from modern microprocessors. In order to achieve scalable performance, parallelization techniques rely on carefully tunning speculative architecture support, runtime environment and software-based transformations. Hardware and software mechanisms have already been proposed to address this problem. They either require deep (and risky) changes on the existing hardware and cache coherence protocols, or exhibit poor performance scalability for a range of applications. Recent work on DOPIPE-based parallelization techniques (e.g. DSWP) has suggested that the combination of page-based data versioning with software speculation can result in good speed-ups. Although a softwareonly solution seems very attractive from an industry point-of-view, it does not enable the whole potential of the microarchitecture in detecting and exploiting parallelism. The addition of cache tags as an enabler for data versioning, as recently announced in the industry, could allow a better exploitation of parallelism at the microarchitecture level. In this paper we present an execution model that supports both DOPIPE-based speculation and traditional speculative parallelization techniques. It is based on a simple cache tagging approach for data versioning, which integrates smoothly with typical cache coherence protocols, and does not require any changes to them. Experimental results, using SPEC and PARSEC benchmarks, reveal a geometric mean speedup of 21.6x for nine sequential programs in a 24-core simulated CMP, while demonstrate improved scalability when compared to a software-only approach (AU)

FAPESP's process:	10/02913-5 - Redundant Computation Elimination in Traces of Multicores using Value Prediction
Grantee:	André Oliveira Loureiro Do Baixo
Support Opportunities:	Scholarships in Brazil - Master

Short URL