Data streams are characterized by continuously generating large amounts of data with a different time interval between each instance. They are an ordered sequence of instances that arrive over time and can be of unbounded size. They have been gaining an increasing amount of attention in recent years due to the numerous real-world applications in dynamic environments that produce non-stationary data and whose traditional methods of Data Mining and Machine Learning are unsuccessful. Data streams can have data changing patterns so that a possible change can make predictive models outdated and inaccurate. In this scenario, data stream classification and quantification are important tasks that have stood out, since it requires constant updates to its model so that the accuracy remains stable due to changes in class distribution over time. In real applications, class labels are rarely available for training a prediction model. The present research intend to develop a data stream classification method, exploring situations of concept drift. Many real-world problems demand effective methods that do not require predictions for individual instances, focusing on obtaining accurate estimates at an aggregate level, as the class distribution in a classification problem. Quantification is a task that consists of estimating the relative frequency (prevalence) of the classes of interest in an unlabeled set, given a training set of items labeled according to the same classes. It has natural application in contexts where training data may not exhibit the same class prevalence pattern as test data. Thus, quantification models will also be proposed in this research to deal with changes in class distribution. An application to be investigated is the classification and the quantification of insect vectors of relevant infectious diseases, such as fevers dengue and Zika.
News published in Agência FAPESP Newsletter about the scholarship: