Advanced search
Start date
Betweenand

Multi-output learning for data streams classification

Grant number: 18/19829-9
Support type:Scholarships abroad - Research
Effective date (Start): March 05, 2019
Effective date (End): March 04, 2020
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Ricardo Cerri
Grantee:Ricardo Cerri
Host: Joao Manuel Portela da Gama
Home Institution: Centro de Ciências Exatas e de Tecnologia (CCET). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil
Local de pesquisa : Universidade do Porto (UP), Portugal  

Abstract

Data streams (DS) are unlimited, continuously generated, non-stationary, and in many cases high-speed data. Many real-world applications generate large amounts of data in a continuous stream, and with the evolution of Information Technology, more data will be generated and collected constantly. This highlights the relevance and necessity for developing algorithms capable of extracting relevant knowledge from these data. Classification is among the DS most investigated tasks, consisting of labeling instances of a DS. This task may also incorporate novelty detection, considering that instances in a data stream may differ significantly from previously known concepts. However, most of the algorithms for DS processing do not consider the fact that instances in the DS can be simultaneously labeled into more than one class, or even labeled in a taxonomy having superclasses and subclasses. In this way, investigating classification methods capable of dealing with such multi-label and hierarchical scenarios is essential. In this context, the main objective of this research project is to propose new strategies for hierarchical and multi-label classification in DS. In particular, we intend to use the successful ideas of the MINAS model, proposed by Faria et. al. 2016, which uses clustering techniques for DS classification, extending this model by incorporating multi-label and hierarchical classification strategies into its two operation phases, offline and online. The main challenge of this research is to adapt the clustering algorithms of the online phase to predict new instances considering relationships and dependencies between classes, intrinsic characteristics of hierarchical and multi-label classification problems.