Detecting Soccer Balls with Reduced Neural Networks: A Comparison of Multiple Architectures Under Constrained Hardware Scenarios

Meneghetti, Douglas De Rizzo; Donadon Homem, Thiago Pedro; Renolfi de Oliveira, Jonas Henrique; da Silva, Isaac Jesus; Perico, Danilo Hernani; da Costa Bianchi, Reinaldo Augusto

Full text
Author(s):	Meneghetti, Douglas De Rizzo ^[1] ; Donadon Homem, Thiago Pedro ^[2] ; Renolfi de Oliveira, Jonas Henrique ^[1] ; da Silva, Isaac Jesus ^[1] ; Perico, Danilo Hernani ^[1] ; da Costa Bianchi, Reinaldo Augusto ^[1] Total Authors: 6
Affiliation:	^[1] FEI Univ Ctr, 3972-B Humberto de Alencar Castelo Branco Ave, BR-09850901 Sao Bernardo Do Campo, SP - Brazil ^[2] Fed Inst Educ Sci & Technol Sao Paulo, 951 Mutinga Ave, Jardim Santo Elias, BR-05110000 Sao Paulo, SP - Brazil Total Affiliations: 2
Document type:	Journal article
Source:	JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS; v. 101, n. 3 FEB 27 2021.
Web of Science Citations:	0
Abstract
Object detection techniques that achieve state-of-the-art detection accuracy employ convolutional neural networks, implemented to have lower latency in graphics processing units. Some hardware systems, such as mobile robots, operate under constrained hardware situations, but still benefit from object detection capabilities. Multiple network models have been proposed, achieving comparable accuracy with reduced architectures and leaner operations. Motivated by the need to create a near real-time object detection system for a soccer team of mobile robots operating with x86 CPU-only embedded computers, this work analyses the average precision and inference time of multiple object detection systems in a constrained hardware setting. We train open implementations of MobileNetV2 and MobileNetV3 models with different underlying architectures, achieved by changing their input and width multipliers, as well as YOLOv3, TinyYOLOv3, YOLOv4 and TinyYOLOv4 in an annotated image dataset captured using a mobile robot. We emphasize the speed/accuracy trade-off in the models by reporting their average precision on a test data set and their inference time in videos at different resolutions, under constrained and unconstrained hardware configurations. Results show that MobileNetV3 models have a good trade-off between average precision and inference time in constrained scenarios only, while MobileNetV2 with high width multipliers are appropriate for server-side inference. YOLO models in their official implementations are not suitable for inference in CPUs. (AU)

FAPESP's process:	19/07665-4 - Center for Artificial Intelligence
Grantee:	Fabio Gagliardi Cozman
Support Opportunities:	Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program

Short URL