Adicionando suporte nativo a paralelismo de tarefas a um sistema RISC-V multicore com suporte a Linux

Lucas Henrique Morais

Full text
Author(s):	Lucas Henrique Morais Total Authors: 1
Document type:	Master's Dissertation
Press:	São Paulo.
Institution:	Universidade de São Paulo (USP). Instituto de Matemática e Estatística (IME/SBI)
Defense date:	2019-08-22
Examining board members:	Alfredo Goldman Vel Lejbman; Rodolfo Jardim de Azevedo; Carlos Alvarez Martinez
Advisor:	Alfredo Goldman Vel Lejbman; Guido Costa Souza de Araújo
Abstract
The Task Scheduling Paradigm is a general technique for leveraging fine and coarse grain parallelism from applications of several domains with minimum impact on code readability, relying on the automatic inference of data dependencies among tasks. The performance of Task Parallel applications is correlated with the speed at which the underlying Task Scheduling System is able to detect such dependencies, something that is critical for fine-granularity workloads, which cannot amortize scheduling overheads with long periods of useful computation. That being the case, several groups have recently been developing FPGA-accelerated Task Scheduling Systems architectures where a software Task Scheduling Runtime is able to offload its bookkeeping computations to an FPGA-based accelerator with the goal of efficiently scheduling fine-grained tasks to CPU cores. Even though these FPGA-accelerated systems offer substantial gains over the software-only baseline, it is also true that FPGA-CPU communication bottlenecks prevent such designs from handling scenarios with either large number of cores or very fine-grained tasks. With that in mind, we proposed the implementation of a Native Task Scheduling System that is, a processor with native support for task scheduling embedded into its architecture with the goal of substantially reducing these overheads. More specifically, this project aimed at embedding the HW logic of Picos, a mature Task Scheduling Accelerator developed by the Barcelona Supercomputing Center (BSC), into Rocket Chip, an open-source, silicon-proven, multi-core implementation of RISC-V. The ISA of the resulting system provides special instructions for Task Applications to interact with this Task Scheduling Logic, ruling out all FPGA-CPU communication latencies. To evaluate the prototype performance, we both (1) adapted Nanos, a mature Task Scheduling runtime, to benefit from the new task-scheduling-accelerating instructions; and (2) developed Phentos, a new HW-accelerated light weight Task Scheduling runtime. Our experiments show that task parallel programs using Nanos-RV the Nanos version ported to our system are on average 2.13 times faster than those being serviced by baseline Nanos, while programs running on Phentos are 13.19 times faster, considering geometric means. Using eight cores, Nanos-RV is able to deliver speedups with respect to serial execution of up to 5.62 times, while Phentos produces speedups of up to 5.72 times. (AU)

FAPESP's process:	17/02682-2 - Designing a task-centric manycore architecture
Grantee:	Lucas Henrique Morais
Support Opportunities:	Scholarships in Brazil - Master

Short URL