Data streams (DS) are unlimited, continuously generated, non-stationary, and in many cases high-speed data. Many real-world applications generate large amounts of data in a continuous stream, and with the evolution of Information Technology, more data will be generated and collected constantly. This highlights the relevance and necessity for developing algorithms capable of extracting relevant knowledge from these data. Classification is among the DS most investigated tasks, consisting of labeling instances of a DS. This task may also incorporate novelty detection, considering that instances in a data stream may differ significantly from previously known concepts. However, most of the algorithms for DS processing do not consider the fact that instances in the DS can be simultaneously labeled into more than one class, or even labeled in a taxonomy having superclasses and subclasses. In this way, investigating classification methods capable of dealing with such multi-label and hierarchical scenarios is essential. In this context, the main objective of this research project is to propose new strategies for hierarchical and multi-label classification in DS. In particular, we intend to use the successful ideas of the MINAS model, proposed by Faria et. al. 2016, which uses clustering techniques for DS classification, extending this model by incorporating multi-label and hierarchical classification strategies into its two operation phases, offline and online. The main challenge of this research is to adapt the clustering algorithms of the online phase to predict new instances considering relationships and dependencies between classes, intrinsic characteristics of hierarchical and multi-label classification problems.
News published in Agência FAPESP Newsletter about the scholarship: