Artificial intelligence in healthcare applications targeting cancer diagnosis-part II: interpreting the model outputs and spotlighting the performance metrics

Araujo, Anna Luiza Damaceno; Sperandio, Marcelo; Calabrese, Giovanna; Faria, Sarah S.; Cardenas, Diego Armando Cardona; Martins, Manoela Domingues; Vargas, Pablo Agustin; Lopes, Marcio Ajudarte; Santos-Silva, Alan Roger; Kowalski, Luiz Paulo; Moraes, Matheus Cardoso

Full text
Author(s): Show less -	Araujo, Anna Luiza Damaceno ; Sperandio, Marcelo ; Calabrese, Giovanna ; Faria, Sarah S. ; Cardenas, Diego Armando Cardona ; Martins, Manoela Domingues ; Vargas, Pablo Agustin ; Lopes, Marcio Ajudarte ; Santos-Silva, Alan Roger ; Kowalski, Luiz Paulo ; Moraes, Matheus Cardoso Total Authors: 11
Document type:	Journal article
Source:	ORAL SURGERY ORAL MEDICINE ORAL PATHOLOGY ORAL RADIOLOGY; v. 140, n. 1, p. 11-pg., 2025-07-01.
Abstract
Background. The lack of standardized performance assessment metrics and the inconsistent reporting of results can lead to the presentation of overly optimistic outcomes that fail to accurately represent key aspects of the Machine Learning framework and may not align with real-world clinical needs. Methods. This conceptual review of the literature compiled the theoretical basis for performance analysis of binary and multiclass models. Results. Accuracy and error rates are straightforward but not ideal if dataset is imbalanced. Sensitivity (recall) and specificity are essential (cancer patients correctly identified as having cancer and benign patients accurately classified as such), as well as precision (identification of true cancer cases among those predicted to have it without falsely labeling healthy individuals as diseased). F1-Score balances precision and recall, while AuC combines sensitivity and specificity, assessing performance across different distributions. Kaplan-Meier curves and log-rank tests offer further insights into model performance over time, especially in survival contexts. Conclusion. Each evaluation metric highlights specific aspects of Convolutional Neural Network training, making it unfeasible to choose just a few (generally the most "convenient" ones) to report in research. (Oral Surg Oral Med Oral Pathol Oral Radiol 2025;140:89-99) (AU)

FAPESP's process:	21/14585-7 - Artificial intelligence applied to the clinical and histopathological diagnosis of Head and Neck Cancer
Grantee:	Anna Luiza Damaceno Araujo
Support Opportunities:	Scholarships in Brazil - Post-Doctoral

Short URL