AVPP: Address-first Value-next Predictor with Value Prefetching for Improving the Efficiency of Load Value Prediction

Orosa, Lois; Azevedo, Rodolfo; Mutlu, Onur

Texto completo
Autor(es):	Orosa, Lois ^{[1, 2, 3]} ; Azevedo, Rodolfo ^{[1, 2]} ; Mutlu, Onur ^[3] Número total de Autores: 3
Afiliação do(s) autor(es):	^[1] Univ Estadual Campinas, UNICAMP, Campinas, SP - Brazil ^[2] Univ Estadual Campinas, Inst Comp, Av Albert Einstein 1251, BR-13083852 Campinas, SP - Brazil ^[3] Swiss Fed Inst Technol, Dept Comp Sci, Univ Str 6, CH-8092 Zurich - Switzerland Número total de Afiliações: 3
Tipo de documento:	Artigo Científico
Fonte:	ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION; v. 15, n. 4 JAN 2019.
Citações Web of Science:	1
Resumo
Value prediction improves instruction level parallelism in superscalar processors by breaking true data dependencies. Although this technique can significantly improve overall performance, most of the state-of-the-art value prediction approaches require high hardware cost, which is the main obstacle for its wide adoption in current processors. To tackle this issue, we revisit load value prediction as an efficient alternative to the classical approaches that predict all instructions. By speculating only on loads, the pressure over shared resources (e.g., the Physical Register File) and the predictor size can be substantially reduced (e.g., more than 90% reduction compared to recent works). We observe that existing value predictors cannot achieve very high performance when speculating only on load instructions. To solve this problem, we propose a new, accurate and low-cost mechanism for predicting the values of load instructions: the Address-first Value-next Predictor with Value Prefetching (AVPP). The key idea of our predictor is to predict the load address first (which, we find, is much more predictable than the value) and to use a small non-speculative Value Table (VT) indexed by the predicted address-to predict the value next. To increase the coverage of AVPP, we aim to increase the hit rate of the VT by predicting also the load address of a future instance of the same load instruction and prefetching its value in the VT. We show that AVPP is relatively easy to implement, requiring only 2.5% of the area of a 32KB L1 data cache. We compare our mechanism with five state-of-the-art value prediction techniques, evaluated within the context of load value prediction, in a relatively narrow out-of-order processor. On average, our AVPP predictor achieves 11.2% speedup and 3.7% of energy savings over the baseline processor, outperforming all the state-of-the-art predictors in 16 of the 23 benchmarks we evaluate. We evaluate AVPP implemented together with different prefetching techniques, showing additive performance gains (20% average speedup). In addition, we propose a new taxonomy to classify different value predictor policies regarding predictor update, predictor availability, and in-flight pending updates. We evaluate these policies in detail. (AU)

Processo FAPESP:	13/08293-7 - CECC - Centro de Engenharia e Ciências Computacionais
Beneficiário:	Munir Salomao Skaf
Modalidade de apoio:	Auxílio à Pesquisa - Centros de Pesquisa, Inovação e Difusão - CEPIDs


Processo FAPESP:	14/03840-2 - Suporte arquitetural para execução especulativa de programas
Beneficiário:	Lois Orosa Nogueira
Modalidade de apoio:	Bolsas no Brasil - Pós-Doutorado


Processo FAPESP:	16/18929-4 - Técnicas especulativas para reduzir o gargalo do sistema de memória
Beneficiário:	Lois Orosa Nogueira
Modalidade de apoio:	Bolsas no Exterior - Estágio de Pesquisa - Pós-Doutorado

URL curto