Advanced search
Start date
Betweenand

Classification of non-stationary data streams with concept drift and extreme label latency

Grant number: 19/23232-0
Support type:Scholarships abroad - Research
Effective date (Start): August 01, 2020
Effective date (End): July 31, 2021
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Adriane Beatriz de Souza Serapião
Grantee:Adriane Beatriz de Souza Serapião
Host: Gustavo Enrique de Almeida Prado Alves Batista
Home Institution: Instituto de Geociências e Ciências Exatas (IGCE). Universidade Estadual Paulista (UNESP). Campus de Rio Claro. Rio Claro , SP, Brazil
Local de pesquisa : University of New South Wales (UNSW), Australia  

Abstract

Data streams are characterized by continuously generating large amounts of data with a different time interval between each sample. They are an ordered sequence of instances that arrive over time and can be of unbounded size. They have been gaining an increasing amount of attention in recent years due to the numerous real-world applications in dynamic environments that produce non-stationary data and whose traditional methods of Data Mining and Machine Learning are unsuccessful. Data streams can have data changing patterns so that a possible change can make predictive models outdated and inaccurate. In this scenario, data stream classification is an important task that has stood out, since it requires constant updates to its model so that the accuracy remains stable due to changes in class distribution over time. Besides, in real applications, class labels are rarely available for training a prediction model. The present research will develop a data stream classification method, exploring situations of concept drift and limitations as extreme latency and imbalanced classes. An application to be investigated is the classification of insect vectors of relevant infectious diseases, such as fevers dengue and Zika.