Busca avançada
Ano de início
Entree


MiDaS: Extract Golden Results from Knowledge Discovery Even over Incomplete Databases

Texto completo
Autor(es):
Mostrar menos -
Rodrigues, Lucas S. ; Vespa, Thiago G. ; Eleuterio, Igor A. R. ; Oliveira, Willian D. ; Traina, Agma J. M. ; Traina Jr, Caetano ; Groen, D ; DeMulatier, C ; Paszynski, M ; Krzhizhanovskaya, VV ; Dongarra, JJ ; Sloot, PMA
Número total de Autores: 12
Tipo de documento: Artigo Científico
Fonte: COMPUTATIONAL SCIENCE, ICCS 2022, PT IV; v. N/A, p. 15-pg., 2022-01-01.
Resumo

The continuous growth in data collection requires effective and efficient capabilities to support Knowledge Discovery in Databases (KDD) over large amounts of complex data. However, as activities such as data acquisition, cleaning, preparation, and recording may lead to incompleteness, impairing the KDD processes, specially because most analysis methods do not adequately handle missing data. To analyze complex data, such as performing similarity search or classification tasks, KDD processes require similarity assessment. However, incompleteness can disrupt the assessment evaluation, making the system unable to compare incomplete tuples. Therefore, incompleteness can render databases useless for knowledge extraction or, at best, dramatically reducing their usefulness. In this paper, we propose MiDaS, a framework based on a RDBMS system that offers tools to deal with missing data employing several strategies, making it possible to assess similarity over complex data, even in the presence of missing data at KDD scenarios. We show experimental results of analyses using MiDaS for similarity retrieval, classification, and clustering tasks over publicly available complex datasets, evaluating the quality and performance of several missing data treatments. The results highlight that MiDaS is well-suited for dealing with incompleteness enhancing data analysis in several KDD scenarios. (AU)

Processo FAPESP: 16/17078-0 - Mineração, indexação e visualização de Big Data no contexto de sistemas de apoio à decisão clínica (MIVisBD)
Beneficiário:Agma Juci Machado Traina
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 20/10902-5 - Tratamento de consulta por similaridade em dados incompletos em um SGBD Relacional
Beneficiário:Lucas Santiago Rodrigues
Modalidade de apoio: Bolsas no Brasil - Programa Capacitação - Treinamento Técnico
Processo FAPESP: 20/07200-9 - Analisando dados complexos vinculados a COVID-19 para apoio à tomada de decisão e prognóstico
Beneficiário:Agma Juci Machado Traina
Modalidade de apoio: Auxílio à Pesquisa - Regular