Advanced search
Start date
Betweenand

Evaluation of the use of artificial intelligence in the title exam of the Brazilian Society of Shoulder and Elbow Surgery

Grant number: 25/01187-4
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Start date: June 01, 2025
End date: May 31, 2026
Field of knowledge:Health Sciences - Medicine - Surgery
Principal Investigator:Marcel Jun Sugawara Tamaoki
Grantee:Karolina Stephany Pereira Ferreira
Host Institution: Escola Paulista de Medicina (EPM). Universidade Federal de São Paulo (UNIFESP). Campus São Paulo. São Paulo , SP, Brazil

Abstract

Artificial intelligence (AI) has established itself as a promising tool in medical education and clinical practice. This scientific initiation project aims to evaluate and compare the performance of five advanced AI models - ChatGPT-4, ChatGPT-4 trained with the core literature of the Brazilian Society of Shoulder and Elbow Surgery (SBCOC), ChatGPT-o1-pro, GEMINI (Google), and Meta LLaMA 3.1 - on the official SBCOC exams administered in the years 2021, 2022, and 2023. Each exam consists of 50 multiple-choice questions, containing exclusively textual content or content associated with clinical images. A minimum score of 50% is required to pass.The methodology involves the standardized application of the 150 questions to the five models, using a uniform prompt and, when applicable, providing the corresponding images. The analysis will be conducted across four main areas: (1) comparison of the models' accuracy rates with the average performance of human candidates; (2) assessment of the reliability of the sources used in the responses; (3) evaluation of the models' ability to interpret image-based questions; and (4) comparison between the different AI models, including an analysis of the impact on performance of the same model (ChatGPT-4) under two training conditions: one with access only to SBCOC literature and the other with unrestricted internet access. Statistical analyses will include the chi-square test and analysis of variance (ANOVA), with a significance level of p < 0.05.The study seeks to identify the strengths and limitations of each model when faced with a high-level technical exam, contributing to the understanding of AI's role in specialized assessments, with potential applications in medical education, clinical practice, and the development of decision-support tools. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)