Accuracy and consistency of ChatGPT-3.5 and-4 in providing differential diagnoses in oral and maxillofacial diseases: a comparative diagnostic performance analysis

Tomo, Saygo; Lechien, Jerome R.; Bueno, Hugo Sobrinho; Cantieri-Debortoli, Daniela Filie; Simonato, Luciana Estevam

Texto completo
Autor(es):	Tomo, Saygo ; Lechien, Jerome R. ; Bueno, Hugo Sobrinho ; Cantieri-Debortoli, Daniela Filie ; Simonato, Luciana Estevam Número total de Autores: 5
Tipo de documento:	Artigo Científico
Fonte:	CLINICAL ORAL INVESTIGATIONS; v. 28, n. 10, p. 11-pg., 2024-09-24.
Resumo
ObjectiveTo investigate the performance of ChatGPT in the differential diagnosis of oral and maxillofacial diseases.MethodsThirty-seven oral and maxillofacial lesions findings were presented to ChatGPT-3.5 and - 4, 18 dental surgeons trained in oral medicine/pathology (OMP), 23 general dental surgeons (DDS), and 16 dental students (DS) for differential diagnosis. Additionally, a group of 15 general dentists was asked to describe 11 cases to ChatGPT versions. The ChatGPT-3.5, -4, and human primary and alternative diagnoses were rated by 2 independent investigators with a 4 Likert-Scale. The consistency of ChatGPT-3.5 and - 4 was evaluated with regenerated inputs.ResultsModerate consistency of outputs was observed for ChatGPT-3.5 and - 4 to provide primary (kappa = 0.532 and kappa = 0.533 respectively) and alternative (kappa = 0.337 and kappa = 0.367 respectively) hypotheses. The mean of correct diagnoses was 64.86% for ChatGPT-3.5, 80.18% for ChatGPT-4, 86.64% for OMP, 24.32% for DDS, and 16.67% for DS. The mean correct primary hypothesis rates were 45.95% for ChatGPT-3.5, 61.80% for ChatGPT-4, 82.28% for OMP, 22.72% for DDS, and 15.77% for DS. The mean correct diagnosis rate for ChatGPT-3.5 with standard descriptions was 64.86%, compared to 45.95% with participants' descriptions. For ChatGPT-4, the mean was 80.18% with standard descriptions and 61.80% with participant descriptions.ConclusionChatGPT-4 demonstrates an accuracy comparable to specialists to provide differential diagnosis for oral and maxillofacial diseases. Consistency of ChatGPT to provide diagnostic hypotheses for oral diseases cases is moderate, representing a weakness for clinical application. The quality of case documentation and descriptions impacts significantly on the performance of ChatGPT.Clinical relevanceGeneral dentists, dental students and specialists in oral medicine and pathology may benefit from ChatGPT-4 as an auxiliary method to define differential diagnosis for oral and maxillofacial lesions, but its accuracy is dependent on precise case descriptions. (AU)

Processo FAPESP:	23/11402-4 - Mecanismos de necrose regulada induzidos pelas terapias fotodinâmica e sonodinâmica em células displásicas epiteliais orais: estudo in vitro e in vivo
Beneficiário:	Saygo Tomo
Modalidade de apoio:	Bolsas no Brasil - Pós-Doutorado

URL curto