Advanced search
Start date

Classification in data streams: dealing with anomalies, novelties and scarcity of labeled data


Traditional machine learning techniques applied to data classification are essentially based on the premise that data distribution is stationary along time. However, recent advances on hardware and software have allowed gathering huge volumes of data in a continuous way. Such phenomenon has given rise to a new variety of applications which need to process an uninterrupted data stream at high speed. In this scenario, the aforementioned assumption is rarely satisfied; future data distribution often changes along time and the continuous data stream demands high costs of processing time and memory. Automatic classification in domains of data with nonstationary distribution is only feasible through the constant updating of the classifier, which requires labeled data from the new distribution. However, in the context of data streams, the labeling process becomes recurrent and associates high costs to the application. In addition to the constant update, the autonomy of the classifier depends on mechanisms to distinguish novelties - or new knowledge which should be incorporated into the model - from anomalies - or irrelevant data that should be discarded. Nonetheless, only recently and yet still narrowly have classification methods applied to data stream been considering the problems of scarcity of labeled data and novelty detection. This project, therefore, aims to contribute to the research of classification algorithms applied to data stream, focusing on the problems of scarcity of labeled data and novelty detection as well. Specifically, the project considers real applications and, as a result, it is expected to develop methods that maximize the autonomy of the classifier system and to minimize the need for labeled data. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items

Scientific publications (4)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
BERTINI JUNIOR, JOAO ROBERTO; NICOLETTI, MARIA DO CARMO. An iterative boosting-based ensemble for streaming data classification. Information Fusion, v. 45, p. 66-78, . (17/00219-3)
BUENO, ANDRES; COELHO, GUILHERME PALERMO; BERTINI JUNIOR, JOAO ROBERTO. Dynamic ensemble mechanisms to improve particulate matter forecasting. APPLIED SOFT COMPUTING, v. 91, . (17/00219-3)
JUNIOR, JOAO ROBERTO BERTINI. Graph embedded rules for explainable predictions in data streams. NEURAL NETWORKS, v. 129, p. 174-192, . (17/00219-3)
BERTINI JUNIOR, JOAO ROBERTO; FUNCIA, MEI ABE; SANTOS, ANTONIO ALBERTO S.; SCHIOZER, DENIS J.; IEEE. A comparison of machine learning algorithms as surrogate model for net present value prediction from wells arrangement data. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), v. N/A, p. 8-pg., . (17/00219-3)

Please report errors in scientific publications list using this form.