Busca avançada
Ano de início
Entree


A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data

Texto completo
Autor(es):
Welsh, H. ; Batalha, C. M. P. F. ; Li, W. ; Mpye, K. L. ; Souza-Pinto, N. C. ; Naslavsky, M. S. ; Parra, E. J.
Número total de Autores: 7
Tipo de documento: Artigo Científico
Fonte: CLINICAL EPIGENETICS; v. 15, n. 1, p. 12-pg., 2023-03-11.
Resumo

BackgroundThe Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.MethodsThis study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson's correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.ResultsThe method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson's correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). (AU)

Processo FAPESP: 14/50649-6 - Estudo SABE: estudo longitudinal de múltiplas coortes sobre as condições de vida e saúde dos idosos do município de São Paulo - coorte 2015
Beneficiário:Yeda Aparecida de Oliveira Duarte
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 14/50931-3 - INCT 2014 - Envelhecimento e Doenças Genéticas: Genômica e Metagenômica
Beneficiário:Mayana Zatz
Modalidade de apoio: Auxílio à Pesquisa - Temático