Anotação de dados guiada por projeção de características

Bárbara Caroline Benato

Full text
Author(s):	Bárbara Caroline Benato Total Authors: 1
Document type:	Master's Dissertation
Press:	Campinas, SP.
Institution:	Universidade Estadual de Campinas (UNICAMP). Instituto de Computação
Defense date:	2019-09-10
Examining board members:	Alexandre Xavier Falcão; Pedro Jussieu de Rezende; Moacir Antonelli Ponti
Advisor:	Alexandre Xavier Falcão
Abstract
Data annotation using visual inspection (supervision) of each training sample can be a laborious process, especially when the number of samples is high --- a well-known problem in deep learning. The data annotation by the user can be even more laborious, particularly in areas that requires an expert with specialized knowledge, such as Medicine and Biology. Traditionally, studies have presented solutions that employ semi-supervised learning to deal with such issue to propagate labels from a few supervised samples to unsupervised samples by exploring the distribution of those samples in the feature space. However, such works do not consider the user's cognitive ability to understand feature space projections for the purpose of increasing the number of labeled samples for machine learning. In this work, we present data annotation methods in which the user is assisted by a visual analytics tool in the task of propagating labels to a large number of unsupervised samples. The user is guided by the knowledge of few labeled samples as well as the visual information of the sample distribution in feature space projection. Also, we investigate a semi-automatic data annotation approach. That is, we combine manual and automatic label propagation using an appropriate feature space projection and semi-supervised label estimation based on a certainty measure to reduce user effort in data annotation. We validate the method in two contexts: on a known image database, MNIST, and on images of human intestinal parasites with and without fecal impurities (an adverse class that makes the problem even more challenging). We evaluate two automatic approaches to semi-supervised learning in latent and projected spaces. In addition, we evaluate two supervised classifiers, trained with the labeled sets. Finally, the experiments aim to choose the solution that best reduces the user effort for data annotation and also increases the classification accuracy on test sets. The results suggest that visual analytics tools can provide more effective machine learning whenever they combine the complementary skills of humans and machines (AU)

FAPESP's process:	16/25776-0 - Autoencoders neural networks optimization by visual analytics data
Grantee:	Bárbara Caroline Benato
Support Opportunities:	Scholarships in Brazil - Master

Short URL