Advanced search
Start date
Betweenand


MiDaS: Extract Golden Results from Knowledge Discovery Even over Incomplete Databases

Full text
Author(s):
Show less -
Rodrigues, Lucas S. ; Vespa, Thiago G. ; Eleuterio, Igor A. R. ; Oliveira, Willian D. ; Traina, Agma J. M. ; Traina Jr, Caetano ; Groen, D ; DeMulatier, C ; Paszynski, M ; Krzhizhanovskaya, VV ; Dongarra, JJ ; Sloot, PMA
Total Authors: 12
Document type: Journal article
Source: COMPUTATIONAL SCIENCE, ICCS 2022, PT IV; v. N/A, p. 15-pg., 2022-01-01.
Abstract

The continuous growth in data collection requires effective and efficient capabilities to support Knowledge Discovery in Databases (KDD) over large amounts of complex data. However, as activities such as data acquisition, cleaning, preparation, and recording may lead to incompleteness, impairing the KDD processes, specially because most analysis methods do not adequately handle missing data. To analyze complex data, such as performing similarity search or classification tasks, KDD processes require similarity assessment. However, incompleteness can disrupt the assessment evaluation, making the system unable to compare incomplete tuples. Therefore, incompleteness can render databases useless for knowledge extraction or, at best, dramatically reducing their usefulness. In this paper, we propose MiDaS, a framework based on a RDBMS system that offers tools to deal with missing data employing several strategies, making it possible to assess similarity over complex data, even in the presence of missing data at KDD scenarios. We show experimental results of analyses using MiDaS for similarity retrieval, classification, and clustering tasks over publicly available complex datasets, evaluating the quality and performance of several missing data treatments. The results highlight that MiDaS is well-suited for dealing with incompleteness enhancing data analysis in several KDD scenarios. (AU)

FAPESP's process: 16/17078-0 - Mining, indexing and visualizing Big Data in clinical decision support systems (MIVisBD)
Grantee:Agma Juci Machado Traina
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 20/10902-5 - Handling similarity queries over incomplete data in a Relational DBMS
Grantee:Lucas Santiago Rodrigues
Support Opportunities: Scholarships in Brazil - Technical Training Program - Technical Training
FAPESP's process: 20/07200-9 - Analyzing complex data from COVID-19 to support decision making and prognosis
Grantee:Agma Juci Machado Traina
Support Opportunities: Regular Research Grants