Advanced search
Start date

Novelty detection in multi-label data streams classification

Grant number: 22/02981-8
Support Opportunities:Research Grants - Initial Project Research Grant
Duration: February 01, 2023 - January 31, 2028
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Ricardo Cerri
Grantee:Ricardo Cerri
Host Institution: Centro de Ciências Exatas e de Tecnologia (CCET). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil
Associated researchers:Diego Furtado Silva ; Elaine Ribeiro de Faria Paiva ; João Manuel Portela da Gama
Associated scholarship(s):23/08406-8 - Ensemble of classifiers for novelty detection in multi-label data streams, BP.DD


Data Streams (DS) are sequences of data of unlimited size, generated continuously, non-stationary, and in many cases, at high speed. Because this flow is potentially infinite, the data cannot be stored in memory, forcing an example to be processed only once and discarded. Several real-world applications generate large amounts of data in a continuous flow, and the trend is that with the evolution of Information Technology, more data is constantly generated and collected. Examples of these applications are collecting data from sensors, generating measurements during network monitoring and analyzing posts on social networks. This highlights the relevance and need to develop algorithms capable of extracting relevant knowledge from these data. Among the tasks involving DS, classification is one of the most important, aiming to label examples not yet seen, and that constantly arrive along the flow. Within this scenario, a major challenge is novelty detection, where novelties are represented by concept drifts and concept evolutions. In concept drift, the distribution that generates the data changes over time, which means that the distributions representing the classes change. In concept evolution, new distributions emerge over time, which means the emergence of new classes in the data stream. Although there are several methods for classifying DS, most of them do not consider the fact that the examples in the stream can be labeled in more than one class simultaneously, and also consider that the classes of the examples are always available together with the examples in the stream, an often unrealistic scenario. Thus, the investigation of classification methods that are capable of dealing with such challenging multi-label scenarios is essential. In this context, this research project has as main objective to propose new strategies for multi-label classification in DS. In addition to detecting concept evolutions and concept drifts, there are other constraints and characteristics that must be considered for the development of new strategies, which make the task difficult and challenging. Among them are the need to consider real-time responses, limited memory, single-pass data, detection of recurring concepts, detection of noise and outliers, infinite delayed labels, and detection of multiple simultaneous concept drifts and evolutions. The proposed methods will be executed on synthetic and real-world datasets, and compared with other methods in the literature. The results will be published in journals and events, and the generated codes and data will be made publicly available. The research results are expected to bring significant impacts and advances to the areas of data stream classification and multi-label learning. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items

Please report errors in scientific publications list by writing to: