Advanced search
Start date
Betweenand


Classifying and Counting with Recurrent Contexts

Full text
Author(s):
Reis, Denis ; Maletzke, Andre ; Silva, Diego F. ; Batista, Gustavo E. A. P. A. ; ACM
Total Authors: 5
Document type: Journal article
Source: KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING; v. N/A, p. 10-pg., 2018-01-01.
Abstract

Many real-world applications in the batch and data stream settings with data shift pose restrictions to the access to class labels after the deployment of a classification or quantification model. However, a significant portion of the data stream literature assumes that actual labels are instantaneously available after issuing their corresponding classifications. In this paper, we explore a different set of assumptions without relying on the availability of class labels. We assume that, although the distribution of the data may change over time, it will switch between one of a handful of well-known distributions. Still, we allow the proportions of the classes to vary. In these conditions, we propose the first method that can accurately identify the correct context of data samples and simultaneously estimate the proportion of the positive class. This estimate can be further used to adjust a classification decision threshold and improve classification accuracy. Finally, the method is very efficient regarding time and memory requirements, fitting data stream applications. (AU)

FAPESP's process: 16/04986-6 - Intelligent traps and sensors: an innovative approach to control insect pests and disease vectors
Grantee:Gustavo Enrique de Almeida Prado Alves Batista
Support Opportunities: Research Grants - eScience and Data Science Program - Regular Program Grants