Inference Time Optimization Using BranchyNet Partitioning

Busca avançada

Pesquisar - Utilize aspas para obter um resultado mais específico

Índice

Área do conhecimento

Ano de início

Entree

Texto completo
Autor(es):	Pacheco, Roberto G. ; Couto, Rodrigo S. ; IEEE Número total de Autores: 3
Tipo de documento:	Artigo Científico
Fonte:	2020 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC); v. N/A, p. 7-pg., 2020-01-01.
Resumo
Deep Neural Network (DNN) inference requires high computation power, which generally involves a cloud infrastructure. However, sending raw data to the cloud can increase the inference time due to the communication delay. To reduce this delay, the first DNN layers can be executed at an edge infrastructure and the remaining ones at the cloud. Depending on which layers are processed at the edge, the amount of data can be highly reduced. However, executing layers at the edge can increase the processing delay. A partitioning problem tries to address this trade-off, choosing the set of layers to be executed at the edge to minimize the inference time. In this work, we address the problem of partitioning a BranchyNet, which is a DNN type where the inference can stop at the middle layers. We show that this partitioning can be treated as the shortest path problem, and thus solved in polynomial time. (AU)

Processo FAPESP:	15/24494-8 - Comunicação e processamento de big data em nuvens e névoas computacionais
Beneficiário:	Nelson Luis Saldanha da Fonseca
Modalidade de apoio:	Auxílio à Pesquisa - Temático