Advanced search
Start date
Betweenand

Classification in data streams: dealing with anomalies, novelties and scarcity of labeled data

Grant number: 17/00219-3
Support type:Regular Research Grants
Duration: June 01, 2017 - November 30, 2019
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:João Roberto Bertini Junior
Grantee:João Roberto Bertini Junior
Home Institution: Faculdade de Tecnologia (FT). Universidade Estadual de Campinas (UNICAMP). Limeira , SP, Brazil

Abstract

Traditional machine learning techniques applied to data classification are essentially based on the premise that data distribution is stationary along time. However, recent advances on hardware and software have allowed gathering huge volumes of data in a continuous way. Such phenomenon has given rise to a new variety of applications which need to process an uninterrupted data stream at high speed. In this scenario, the aforementioned assumption is rarely satisfied; future data distribution often changes along time and the continuous data stream demands high costs of processing time and memory. Automatic classification in domains of data with nonstationary distribution is only feasible through the constant updating of the classifier, which requires labeled data from the new distribution. However, in the context of data streams, the labeling process becomes recurrent and associates high costs to the application. In addition to the constant update, the autonomy of the classifier depends on mechanisms to distinguish novelties - or new knowledge which should be incorporated into the model - from anomalies - or irrelevant data that should be discarded. Nonetheless, only recently and yet still narrowly have classification methods applied to data stream been considering the problems of scarcity of labeled data and novelty detection. This project, therefore, aims to contribute to the research of classification algorithms applied to data stream, focusing on the problems of scarcity of labeled data and novelty detection as well. Specifically, the project considers real applications and, as a result, it is expected to develop methods that maximize the autonomy of the classifier system and to minimize the need for labeled data. (AU)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
BERTINI JUNIOR, JOAO ROBERTO; NICOLETTI, MARIA DO CARMO. An iterative boosting-based ensemble for streaming data classification. Information Fusion, v. 45, p. 66-78, JAN 2019. Web of Science Citations: 4.

Please report errors in scientific publications list by writing to: cdi@fapesp.br.