Research Grants 22/02981-8 - Aprendizado computacional, Classificação multirrótulo - BV FAPESP
Advanced search
Start date
Betweenand

Novelty detection in multi-label data streams classification

Abstract

Data Streams (DS) are sequences of data of unlimited size, generated continuously, non-stationary, and in many cases, at high speed. Because this flow is potentially infinite, the data cannot be stored in memory, forcing an example to be processed only once and discarded. Several real-world applications generate large amounts of data in a continuous flow, and the trend is that with the evolution of Information Technology, more data is constantly generated and collected. Examples of these applications are collecting data from sensors, generating measurements during network monitoring and analyzing posts on social networks. This highlights the relevance and need to develop algorithms capable of extracting relevant knowledge from these data. Among the tasks involving DS, classification is one of the most important, aiming to label examples not yet seen, and that constantly arrive along the flow. Within this scenario, a major challenge is novelty detection, where novelties are represented by concept drifts and concept evolutions. In concept drift, the distribution that generates the data changes over time, which means that the distributions representing the classes change. In concept evolution, new distributions emerge over time, which means the emergence of new classes in the data stream. Although there are several methods for classifying DS, most of them do not consider the fact that the examples in the stream can be labeled in more than one class simultaneously, and also consider that the classes of the examples are always available together with the examples in the stream, an often unrealistic scenario. Thus, the investigation of classification methods that are capable of dealing with such challenging multi-label scenarios is essential. In this context, this research project has as main objective to propose new strategies for multi-label classification in DS. In addition to detecting concept evolutions and concept drifts, there are other constraints and characteristics that must be considered for the development of new strategies, which make the task difficult and challenging. Among them are the need to consider real-time responses, limited memory, single-pass data, detection of recurring concepts, detection of noise and outliers, infinite delayed labels, and detection of multiple simultaneous concept drifts and evolutions. The proposed methods will be executed on synthetic and real-world datasets, and compared with other methods in the literature. The results will be published in journals and events, and the generated codes and data will be made publicly available. The research results are expected to bring significant impacts and advances to the areas of data stream classification and multi-label learning. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
DEL VALLE, ALINE MARQUES; MANTOVANI, RAFAEL GOMES; CERRI, RICARDO. A systematic literature review on AutoML for multi-target learning tasks. ARTIFICIAL INTELLIGENCE REVIEW, v. N/A, p. 40-pg., . (22/02981-8)
ILIDIO, PEDRO; ALVES, ANDRE; CERRI, RICARDO. Fast Bipartite Forests for Semi-supervised Interaction Prediction. 39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, v. N/A, p. 8-pg., . (22/02981-8)