Busca avançada
Ano de início
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx

Texto completo
Mounir, Mohamed [1] ; Lucchetta, Marta [1] ; Silva, Tiago C. [2] ; Olsen, Catharina [3, 4] ; Bontempi, Gianluca [3, 4] ; Chen, Xi [5, 6] ; Noushmehr, Houtan [2, 7] ; Colaprico, Antonio [6, 3, 4] ; Papaleo, Elena [1, 8]
Número total de Autores: 9
Afiliação do(s) autor(es):
[1] Danish Canc Soc Res Ctr, Computat Biol Lab, Copenhagen - Denmark
[2] Univ Sao Paulo, Ribeirao Preto Med Sch, Dept Genet, Ribeirao Preto - Brazil
[3] Interuniv Inst Bioinformat Brussels IB 2, Brussels - Belgium
[4] ULB, MLG, Dept Informat, Brussels - Belgium
[5] Sylvester Comprehens Canc Ctr, Miami, FL - USA
[6] Univ Miami, Miller Sch Med, Dept Publ Hlth Sci, Div Biostat, Miami, FL 33136 - USA
[7] Henry Ford Hosp, Dept Neurosurg, Detroit, MI 48202 - USA
[8] Univ Copenhagen, Novo Nordisk Fdn, Ctr Prot Res, Translat Dis Syst Biol, Fac Hlth & Med Sci, Copenhagen - Denmark
Número total de Afiliações: 8
Tipo de documento: Artigo Científico
Fonte: PLOS COMPUTATIONAL BIOLOGY; v. 15, n. 3 MAR 2019.
Citações Web of Science: 7

The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7. Author summary The advent of Next-Generation Sequencing (NGS) technologies has been generating a massive amount of data which require continuous efforts in developing and maintain computational tool for data analyses. The Genomic Data Commons (GDC) Data Portal is a platform that contains different cancer genomic studies. Such platforms have often the primary focus on the data storage and they do not provide a comprehensive toolkit for analyses. To fulfil this urgent need, comprehensive but accessible computational protocols that do not renounce a robust statistical framework are thus required. In this context, we here present the new functions of the R/Bioconductor package TCGAbiolinks to improve the discovery of differentially expressed genes in cancer and tumor (sub)types, include the estimate of tumor purity and tumor infiltrations, use normal samples from other platforms and support more broadly other genomics datasets. (AU)

Processo FAPESP: 15/07925-5 - Softwares de código aberto contendo ferramentas estatísticas para análise e integração de conjuntos de dados epigenômicos produzidos em alta escala, a fim de decifrar e entender redes reguladoras de câncer
Beneficiário:Houtan Noushmehr
Linha de fomento: Auxílio à Pesquisa - Apoio a Jovens Pesquisadores
Processo FAPESP: 16/01389-7 - Ferramenta de bioinformática para integrar e compreender as mudanças epigenômicas e genômicas aberrantes associadas com câncer: métodos, desenvolvimento e análise
Beneficiário:Tiago Chedraoui Silva
Linha de fomento: Bolsas no Brasil - Doutorado Direto