Research Grants 22/02981-8 - Aprendizado computacional, Classificação multirrótulo

Abstract

Data Streams (DS) are sequences of data of unlimited size, generated continuously, non-stationary, and in many cases, at high speed. Because this flow is potentially infinite, the data cannot be stored in memory, forcing an example to be processed only once and discarded. Several real-world applications generate large amounts of data in a continuous flow, and the trend is that with the evolution of Information Technology, more data is constantly generated and collected. Examples of these applications are collecting data from sensors, generating measurements during network monitoring and analyzing posts on social networks. This highlights the relevance and need to develop algorithms capable of extracting relevant knowledge from these data. Among the tasks involving DS, classification is one of the most important, aiming to label examples not yet seen, and that constantly arrive along the flow. Within this scenario, a major challenge is novelty detection, where novelties are represented by concept drifts and concept evolutions. In concept drift, the distribution that generates the data changes over time, which means that the distributions representing the classes change. In concept evolution, new distributions emerge over time, which means the emergence of new classes in the data stream. Although there are several methods for classifying DS, most of them do not consider the fact that the examples in the stream can be labeled in more than one class simultaneously, and also consider that the classes of the examples are always available together with the examples in the stream, an often unrealistic scenario. Thus, the investigation of classification methods that are capable of dealing with such challenging multi-label scenarios is essential. In this context, this research project has as main objective to propose new strategies for multi-label classification in DS. In addition to detecting concept evolutions and concept drifts, there are other constraints and characteristics that must be considered for the development of new strategies, which make the task difficult and challenging. Among them are the need to consider real-time responses, limited memory, single-pass data, detection of recurring concepts, detection of noise and outliers, infinite delayed labels, and detection of multiple simultaneous concept drifts and evolutions. The proposed methods will be executed on synthetic and real-world datasets, and compared with other methods in the literature. The results will be published in journals and events, and the generated codes and data will be made publicly available. The research results are expected to bring significant impacts and advances to the areas of data stream classification and multi-label learning. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:

More items Less items

TITULO

Articles published in other media outlets ( ):

More items Less items

VEICULO: TITULO (DATA)

Scientific publications (5)

(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)

DEL VALLE, ALINE MARQUES; MANTOVANI, RAFAEL GOMES; CERRI, RICARDO. A systematic literature review on AutoML for multi-target learning tasks. ARTIFICIAL INTELLIGENCE REVIEW, v. N/A, p. 40-pg., 2023-08-10. (22/02981-8)

ILIDIO, PEDRO; ALVES, ANDRE; CERRI, RICARDO. Fast Bipartite Forests for Semi-supervised Interaction Prediction. 39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, v. N/A, p. 8-pg., 2024-01-01. (22/02981-8)

CASAROTTO, PEDRO HENRIQUE; CERRI, RICARDO. Growing Self-Organizing Maps for Multi-label Classification. INTELLIGENT SYSTEMS, BRACIS 2024, PT II, v. 15413, p. 16-pg., 2025-01-01. (22/02981-8)

ALCANTARA, LEONARDO U.; TRIGUERO, ISAAC; CERRI, RICARDO. Semi-supervised Predictive Clustering Trees for Multi-label Protein Subcellular Localization. INTELLIGENT SYSTEMS, BRACIS 2024, PT II, v. 15413, p. 16-pg., 2025-01-01. (16/25220-1, 22/02981-8, 17/24807-1)

ALVES, JULIANA; COSTA, EDUARDO; XAVIER, ALENCAR; BRITO, LUIZ; CERRI, RICARDO. Comparative Analysis of Machine Learning Algorithms for Identifying Genetic Markers Linked to Alzheimer's Disease. INTELLIGENT SYSTEMS, BRACIS 2024, PT III, v. 15414, p. 15-pg., 2025-01-01. (22/02981-8, 21/12618-5, 20/08634-2)

Grant number:	22/02981-8
Support Opportunities:	Research Grants - Initial Project
Start date:	February 01, 2023
End date:	January 31, 2028
Field of knowledge:	Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques

Principal Investigator:	Ricardo Cerri
Grantee:	Ricardo Cerri

Host Institution:	Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil

Associated researchers:	Diego Furtado Silva ; Elaine Ribeiro de Faria Paiva ; João Manuel Portela da Gama ; Márcio Porto Basgalupp

Associated research grant(s):	24/19234-6 - Automatic Machine Learning for Multi-Label Classification, AP.R SPRINT
Associated scholarship(s):	24/15875-7 - Investigation of Evaluation Methodologies for Multi-Label Classification Problems in Continuous Data Streams, BP.MS 23/08406-8 - Ensemble of classifiers for novelty detection in multi-label data streams, BP.DD

Short URL