Evaluating and mitigating biases in skin lesion analysis

Alceu Emanuel Bissoto

Full text
Author(s):	Alceu Emanuel Bissoto Total Authors: 1
Document type:	Doctoral Thesis
Press:	Campinas, SP.
Institution:	Universidade Estadual de Campinas (UNICAMP). Instituto de Computação
Defense date:	2024-05-20
Examining board members:	Sandra Eliza Fontes de Avila; Flávia Vasques Bittencourt; André Georghton Cardoso Pacheco; Esther Luna Colombini; Leticia Rittner
Advisor:	Sandra Eliza Fontes de Avila; Eduardo Valle
Abstract
Deep learning models are increasingly used in real-world applications, including automated diagnosis. However, these models can inherit biases from their training data. In medical imaging, limited data contributions from various centers create distribution shifts, potentially causing catastrophic outcomes if biases are not addressed. This thesis investigates shifts in skin lesion datasets and models, focusing on skin cancer detection, a major concern in Brazil. Early detection is crucial, and automated diagnosis presents a promising solution, especially for patients facing barriers. Since dermatologists' skin cancer diagnosis relies on pattern recognition, machine learning techniques suit this task well. Despite advancements, the field struggles with generalization and bias reliance. This thesis contributes to data annotation, bias evaluation, and debiasing techniques to address these challenges. First, we have annotated the ISIC 2018 and Derm7pt datasets in the presence of common artifacts. Furthermore, we annotated over 10,000 samples from ISIC 2019 with the location of such artifacts. These annotations provided a foundation for assessing model robustness and facilitating debiasing efforts in our work. For bias evaluation, we introduced a novel approach to dividing training and testing data to allow for controllable bias levels. This method, called "Trap Sets", is designed to reveal a model's dependency on irrelevant features by adjusting bias levels during training and presenting contrary correlations during testing. Trap Sets enabled a precise examination of shortcut learning in skin lesion analysis, a capability typically limited to simpler toy datasets. Our experiments on skin lesion images and their annotated artifacts demonstrated that models often rely on these irrelevant features, a tendency that Trap Sets effectively penalized. Models that commonly surpassed 90% AUC in random train and test divisions, reached only 58% AUC in our biased scenarios. In addressing debiasing, we explored both training and test-time strategies. During training, we leveraged our artifact annotations to create environments that allow robust optimization techniques, such as "GroupDRO", to guide models toward more relevant features. Doing such already improved robustness, with an increase of 10 percentage points above the baseline (68% AUC). At test time, we focused on identifying and utilizing clinically relevant features for inference. "NoiseCrop" modifies the testing samples by erasing their background. Meanwhile, "Test-time Selection" (TTS) puts a human in the loop to identify positive and negative interest points and erase the features that disagree with the human annotation. Both techniques highly increased our models' robustness to artifacts, reaching performances of 72% and 75%, respectively. These strategies for evaluation and debiasing, which proved effective in our experiments, can pave the way for more equitable and accurate skin lesion diagnostics (AU)

FAPESP's process:	19/19619-7 - Generating unlimited skin lesion images with generative adversarial networks
Grantee:	Alceu Emanuel Bissoto
Support Opportunities:	Scholarships in Brazil - Doctorate

Short URL