A New Natural Language Processing-Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study

Paiva, Bruno; Goncalves, Marcos Andre; Dutra da Rocha, Leonardo Chaves; Marcolino, Milena Soriano; Barbosa Lana, Fernanda Cristina; Souza-Silva, Maira Viana Rego; Almeida, Jussara M.; Pereira, Polianna Delfino; Valiense de Andrade, Claudio Moises; dos Reis Gomes, Angelica Gomides; Pires Ferreira, Maria Angelica; Bartolazzi, Frederico; Sacioto, Manuela Furtado; Boscato, Ana Paula; Guimaraes-Junior, Milton Henriques; dos Reis, Priscilla Pereira; Costa, Felicio Roberto; Jorge, Alzira de Oliveira; Coelho, Laryssa Reis; Carneiro, Marcelo; Souza Sales, Thais Lorenna; Araujo, Silvia Ferreira; Silveira, Daniel Vitorio; Ruschel, Karen Brasil; Veloso Santos, Fernanda Caldeira; de Almeida Cenci, Evelin Paola; Monteiro Menezes, Luanna Silva; Anschau, Fernando; Camargos Bicalho, Maria Aparecida; Fernandes Manenti, Euler Roberto; Finger, Renan Goulart; Ponce, Daniela; de Aguiar, Filipe Carrilho; Marques, Luiza Margoto; de Castro, Luis Cesar; Vietta, Giovanna Grunewald; de Godoy, Mariana Frizzo; Vilaca, Mariana do Nascimento; Morais, Vivian Costa

Texto completo
Autor(es): Mostrar menos -	Paiva, Bruno ; Goncalves, Marcos Andre ; Dutra da Rocha, Leonardo Chaves ; Marcolino, Milena Soriano ; Barbosa Lana, Fernanda Cristina ; Souza-Silva, Maira Viana Rego ; Almeida, Jussara M. ; Pereira, Polianna Delfino ; Valiense de Andrade, Claudio Moises ; dos Reis Gomes, Angelica Gomides ; Pires Ferreira, Maria Angelica ; Bartolazzi, Frederico ; Sacioto, Manuela Furtado ; Boscato, Ana Paula ; Guimaraes-Junior, Milton Henriques ; dos Reis, Priscilla Pereira ; Costa, Felicio Roberto ; Jorge, Alzira de Oliveira ; Coelho, Laryssa Reis ; Carneiro, Marcelo ; Souza Sales, Thais Lorenna ; Araujo, Silvia Ferreira ; Silveira, Daniel Vitorio ; Ruschel, Karen Brasil ; Veloso Santos, Fernanda Caldeira ; de Almeida Cenci, Evelin Paola ; Monteiro Menezes, Luanna Silva ; Anschau, Fernando ; Camargos Bicalho, Maria Aparecida ; Fernandes Manenti, Euler Roberto ; Finger, Renan Goulart ; Ponce, Daniela ; de Aguiar, Filipe Carrilho ; Marques, Luiza Margoto ; de Castro, Luis Cesar ; Vietta, Giovanna Grunewald ; de Godoy, Mariana Frizzo ; Vilaca, Mariana do Nascimento ; Morais, Vivian Costa Número total de Autores: 39
Tipo de documento:	Artigo Científico
Fonte:	JMIR MEDICAL INFORMATICS; v. 12, p. 24-pg., 2024-01-01.
Resumo
Background: Proper analysis and interpretation of health care data can significantly improve patient outcomes by enhancing services and revealing the impacts of new technologies and treatments. Understanding the substantial impact of temporal shifts in these data is crucial. For example, COVID-19 vaccination initially lowered the mean age of at-risk patients and later changed the characteristics of those who died. This highlights the importance of understanding these shifts for assessing factors that affect patient outcomes. Objective: This study aims to propose detection, initial characterization, and semantic characterization (DIS), a new methodology for analyzing changes in health outcomes and variables over time while discovering contextual changes for outcomes in large volumes of data. Methods: The DIS methodology involves 3 steps: detection, initial characterization, and semantic characterization. Detection uses metrics such as Jensen-Shannon divergence to identify significant data drifts. Initial characterization offers a global analysis of changes in data distribution and predictive feature significance over time. Semantic characterization uses natural language processing-inspired techniques to understand the local context of these changes, helping identify factors driving changes in patient outcomes. By integrating the outcomes from these 3 steps, our results can identify specific factors (eg, interventions and modifications in health care practices) that drive changes in patient outcomes. DIS was applied to the Brazilian COVID-19 Registry and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) data sets. Results: Our approach allowed us to (1) identify drifts effectively, especially using metrics such as the Jensen-Shannon divergence, and (2) uncover reasons for the decline in overall mortality in both the COVID-19 and MIMIC-IV data sets, as well as changes in the cooccurrence between different diseases and this particular outcome. Factors such as vaccination during the COVID-19 pandemic and reduced iatrogenic events and cancer-related deaths in MIMIC-IV were highlighted. The methodology also pinpointed shifts in patient demographics and disease patterns, providing insights into the evolving health care landscape during the study period. Conclusions: We developed a novel methodology combining machine learning and natural language processing techniques to detect, characterize, and understand temporal shifts in health care data. This understanding can enhance predictive algorithms, improve patient outcomes, and optimize health care resource allocation, ultimately improving the effectiveness of machine learning predictive algorithms applied to health care data. Our methodology can be applied to a variety of scenarios beyond those discussed in this paper. (AU)

Processo FAPESP:	20/09866-4 - Centro de Inovação em Inteligência Artificial para a Saúde (CIIA-Saúde)
Beneficiário:	Virgilio Augusto Fernandes Almeida
Modalidade de apoio:	Auxílio à Pesquisa - Programa Centros de Pesquisa em Engenharia

URL curto