AVPP: Address-first Value-next Predictor with Value Prefetching for Improving the Efficiency of Load Value Prediction

Orosa, Lois; Azevedo, Rodolfo; Mutlu, Onur

Full text
Author(s):	Orosa, Lois ^{[1, 2, 3]} ; Azevedo, Rodolfo ^{[1, 2]} ; Mutlu, Onur ^[3] Total Authors: 3
Affiliation:	^[1] Univ Estadual Campinas, UNICAMP, Campinas, SP - Brazil ^[2] Univ Estadual Campinas, Inst Comp, Av Albert Einstein 1251, BR-13083852 Campinas, SP - Brazil ^[3] Swiss Fed Inst Technol, Dept Comp Sci, Univ Str 6, CH-8092 Zurich - Switzerland Total Affiliations: 3
Document type:	Journal article
Source:	ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION; v. 15, n. 4 JAN 2019.
Web of Science Citations:	1
Abstract
Value prediction improves instruction level parallelism in superscalar processors by breaking true data dependencies. Although this technique can significantly improve overall performance, most of the state-of-the-art value prediction approaches require high hardware cost, which is the main obstacle for its wide adoption in current processors. To tackle this issue, we revisit load value prediction as an efficient alternative to the classical approaches that predict all instructions. By speculating only on loads, the pressure over shared resources (e.g., the Physical Register File) and the predictor size can be substantially reduced (e.g., more than 90% reduction compared to recent works). We observe that existing value predictors cannot achieve very high performance when speculating only on load instructions. To solve this problem, we propose a new, accurate and low-cost mechanism for predicting the values of load instructions: the Address-first Value-next Predictor with Value Prefetching (AVPP). The key idea of our predictor is to predict the load address first (which, we find, is much more predictable than the value) and to use a small non-speculative Value Table (VT) indexed by the predicted address-to predict the value next. To increase the coverage of AVPP, we aim to increase the hit rate of the VT by predicting also the load address of a future instance of the same load instruction and prefetching its value in the VT. We show that AVPP is relatively easy to implement, requiring only 2.5% of the area of a 32KB L1 data cache. We compare our mechanism with five state-of-the-art value prediction techniques, evaluated within the context of load value prediction, in a relatively narrow out-of-order processor. On average, our AVPP predictor achieves 11.2% speedup and 3.7% of energy savings over the baseline processor, outperforming all the state-of-the-art predictors in 16 of the 23 benchmarks we evaluate. We evaluate AVPP implemented together with different prefetching techniques, showing additive performance gains (20% average speedup). In addition, we propose a new taxonomy to classify different value predictor policies regarding predictor update, predictor availability, and in-flight pending updates. We evaluate these policies in detail. (AU)

FAPESP's process:	13/08293-7 - CCES - Center for Computational Engineering and Sciences
Grantee:	Munir Salomao Skaf
Support Opportunities:	Research Grants - Research, Innovation and Dissemination Centers - RIDC


FAPESP's process:	14/03840-2 - Architectural support for programs speculative execution
Grantee:	Lois Orosa Nogueira
Support Opportunities:	Scholarships in Brazil - Post-Doctoral


FAPESP's process:	16/18929-4 - Speculative techniques for reducing the memory bottleneck problem
Grantee:	Lois Orosa Nogueira
Support Opportunities:	Scholarships abroad - Research Internship - Post-doctor

Short URL