Advanced search
Start date
Betweenand

Data stream classification in the presence of concept drifts using non-supervised methods

Grant number: 13/23037-7
Support type:Scholarships abroad - Research Internship - Doctorate
Effective date (Start): March 01, 2014
Effective date (End): August 31, 2014
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Gustavo Enrique de Almeida Prado Alves Batista
Grantee:Vinícius Mourão Alves de Souza
Supervisor abroad: João Manuel Portela da Gama
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Local de pesquisa : Instituto de Engenharia de Sistemas e Computadores - Tecnologia e Ciência (INESC TEC), Portugal  
Associated to the scholarship:11/17698-5 - Classification of non-stationary data stream with application in sensors for insect identification, BP.DR

Abstract

Applications as sensors are responsible for generate a large quantity of data. Such data may form a continuous and orderly flow of information that arrive over time. The classification of these data in real time with machine learning algorithms is an important task for a large number of real applications. However, a frequent assumption is that the data used in classifier training phase are sufficiently representative throughout its useful life. But, for a large part of real applications, non-stationary processes are responsible for generating the data. Therefore, the data changes over time. In the machine learning context, this phenomena is called concept drift. Thus, the use of static classifiers in this conditions is inadequate and requires the use of adaptive classifiers for data streams capable of dealing with the presence of concept drifts. The most of methods that deal with classification of non-stationary data streams assumes that the actual labels of the processed instances become available after a certain period of time. Thereby, the most recent labeled data may be used for update the classification model. This assumption is unrealistic for a large number of real applications or involves a high cost for the process of labelling the data. Therefore, this project aims to investigate methods for classifying data streams in the presence of concept drifts that does not have the need of knowledge of the actual labels of the recently processed instances. (AU)