Advanced search
Start date
Betweenand

Classification and quantification in non-stationary data streams with concept drift

Grant number: 21/12278-0
Support type:Scholarships abroad - Research
Effective date (Start): May 01, 2022
Effective date (End): April 30, 2023
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal researcher:Adriane Beatriz de Souza Serapião
Grantee:Adriane Beatriz de Souza Serapião
Host: Gustavo Enrique de Almeida Prado Alves Batista
Home Institution: Instituto de Geociências e Ciências Exatas (IGCE). Universidade Estadual Paulista (UNESP). Campus de Rio Claro. Rio Claro , SP, Brazil
Research place: University of New South Wales (UNSW), Australia  

Abstract

Data streams are characterized by continuously generating large amounts of data with a different time interval between each instance. They are an ordered sequence of instances that arrive over time and can be of unbounded size. They have been gaining an increasing amount of attention in recent years due to the numerous real-world applications in dynamic environments that produce non-stationary data and whose traditional methods of Data Mining and Machine Learning are unsuccessful. Data streams can have data changing patterns so that a possible change can make predictive models outdated and inaccurate. In this scenario, data stream classification and quantification are important tasks that have stood out, since it requires constant updates to its model so that the accuracy remains stable due to changes in class distribution over time. In real applications, class labels are rarely available for training a prediction model. The present research intend to develop a data stream classification method, exploring situations of concept drift. Many real-world problems demand effective methods that do not require predictions for individual instances, focusing on obtaining accurate estimates at an aggregate level, as the class distribution in a classification problem. Quantification is a task that consists of estimating the relative frequency (prevalence) of the classes of interest in an unlabeled set, given a training set of items labeled according to the same classes. It has natural application in contexts where training data may not exhibit the same class prevalence pattern as test data. Thus, quantification models will also be proposed in this research to deal with changes in class distribution. An application to be investigated is the classification and the quantification of insect vectors of relevant infectious diseases, such as fevers dengue and Zika. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Please report errors in scientific publications list by writing to: cdi@fapesp.br.