The impact of scan number and its preprocessing in micro-FTIR imaging when applying machine learning for breast cancer subtypes classification

del-Valle, Matheus; Santos, Moises Oliveira dos; Santos, Sofia Nascimento dos; Castro, Pedro Arthur Augusto de; Bernardes, Emerson Soares; Zezell, Denise Maria

Texto completo
Autor(es):	del-Valle, Matheus ^[1] ; Santos, Moises Oliveira dos ^{[1, 2]} ; Santos, Sofia Nascimento dos ^[3] ; Castro, Pedro Arthur Augusto de ^[1] ; Bernardes, Emerson Soares ^[3] ; Zezell, Denise Maria ^[1] Número total de Autores: 6
Afiliação do(s) autor(es):	^[1] CNEN, IPEN, Inst Pesquisas Energet & Nucl, Ctr Lasers & Aplicacoes, BR-05508000 Sao Paulo - Brazil ^[2] Univ Estado Amazonas, Escola Super Tecnol, BR-69050030 Manaus, Amazonas - Brazil ^[3] CNEN, IPEN, Inst Pesquisas Energet & Nucl, Ctr Radiofarm, BR-05508000 Sao Paulo - Brazil Número total de Afiliações: 3
Tipo de documento:	Artigo Científico
Fonte:	VIBRATIONAL SPECTROSCOPY; v. 117, NOV 2021.
Citações Web of Science:	0
Resumo
The breast cancer molecular subtype is an important classification to outline the prognostic. Gold-standard assessing using immunohistochemistry adds subjectivity due to interlaboratory and interobserver variations. In order to increase the diagnosis confidence, other techniques need to be examined, where the FTIR spectroscopy imaging allied with machine learning techniques may provide additional and quantitative information regarding the molecular composition. However, the impact of co-added scans acquisition parameter into machine learning classifications still needs better evaluation. In this study, FTIR images of Luminal B and HER2 subtypes were acquired varying the scan number and preprocessing techniques. It was demonstrated a spectral quality improvement when the scan number was increased, decreasing the standard deviation and outliers. Six machine learning models were used to classify the subtypes: Linear Discriminant Analysis, Partial Least Squares Discriminant Analysis, K-Nearest Neighbors, Support Vector Machine, Random Forest and Extreme Gradient Boosting. Best mean accuracy of 0.995 was achieved by Extreme Gradient Boosting model. It was found that all models achieved similar high accuracies with groups b256\_064 (256 background and 064 scans), b256\_128 and b128\_128. Besides assessing the performance of different models, the b256\_064 was established as the optimal group due to the minimum acquisition time. Therefore, this work indicates b256\_064 for breast cancer subtype classification and also as a basis for other studies using machine learning for cancer evaluation. (AU)

Processo FAPESP:	17/50332-0 - Capacitação científica, tecnológica e em infraestrutura em radiofármacos, radiações e empreendedorismo a serviço da saúde (PDIp)
Beneficiário:	Marcelo Linardi
Modalidade de apoio:	Auxílio à Pesquisa - Programa Modernização de Institutos Estaduais de Pesquisa

URL curto