Scaling Up Modulo Scheduling for High-Level Synthesis

Rosa, Leandro de Souza; Bouganis, Christos-Savvas; Bonato, Vanderlei

Full text
Author(s):	Rosa, Leandro de Souza ^[1] ; Bouganis, Christos-Savvas ^[2] ; Bonato, Vanderlei ^[1] Total Authors: 3
Affiliation:	^[1] Univ Sao Paulo, Inst Math & Comp Sci, BR-05508900 Sao Carlos, SP - Brazil ^[2] Imperial Coll London, Dept Elect & Elect Engn, London SW7 2AZ - England Total Affiliations: 2
Document type:	Journal article
Source:	IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS; v. 38, n. 5, p. 912-925, MAY 2019.
Web of Science Citations:	0
Abstract
High-level synthesis (HLS) tools have been increasingly used within the hardware design community to bridge the gap between productivity and the need to design large and complex systems. When targeting heterogeneous systems, where the CPU and the field-programmable gate array (FPGA) fabric are both available to perform computations, a design space exploration (DSE) is usually carried out for deciding which parts of the initial code should be mapped to the FPGA fabric such as the overall system's performance is enhanced by accelerating its computation via dedicated processors. As the targeted systems become more complex and larger, leading to a large DSE, the fast estimative of the possible acceleration that can be obtained by mapping certain functionality into the FPGA fabric is of paramount importance. Loop pipelining, which is responsible for the majority of HLS compilation time, is a key optimization toward achieving high-performance acceleration kernels. A new modulo scheduling algorithm is proposed, which reformulates the classical modulo scheduling problem and leads to a reduced number of integer linear problems solved, resulting in large computational savings. Moreover, the proposed approach has a controlled tradeoff between solution quality and computation time. Results show the scalability is improved efficiently from quadratic, for the state-of-the-art method, to linear, for the proposed approach, while the optimized loop suffers a 1% (geomean) increment in the total number of cycles. (AU)

FAPESP's process:	16/13327-6 - Design space exploration on heterogeneous systems for high performance applications
Grantee:	Leandro de Souza Rosa
Support Opportunities:	Scholarships abroad - Research Internship - Doctorate (Direct)

Short URL