Advanced search
Start date
Related content

Open source software statistical tools to aid in analyzing and integrating large cancer epigenomic datasets in order to decipher and understand regulatory networks

Grant number: 15/07925-5
Support type:Research Grants - Young Investigators Grants
Duration: June 01, 2015 - May 31, 2019
Field of knowledge:Health Sciences - Medicine
Principal Investigator:Houtan Noushmehr
Grantee:Houtan Noushmehr
Home Institution: Faculdade de Medicina de Ribeirão Preto (FMRP). Universidade de São Paulo (USP). Ribeirão Preto, SP, Brazil
Assoc. researchers:Ana Valeria Castro ; Camila Ferreira de Souza ; Carlos Gilberto Carlotti Jr ; Daniela Pretti da Cunha Tirapelli ; Eduardo Magalhães Rego ; Luciano Neder Serafini ; Miriam Galvonas Jasiulionis ; Tathiane Maistro Malta Pereira
Associated scholarship(s):16/11039-3 - Charting epigenomic signatures in CpG island Methylator Phenotype (CIMP) tumors, BP.DR
16/01389-7 - Bioinformatic tool to integrate and understand aberrant epigenomic and genomic changes associated with cancer: methods, development and analysis, BP.DD
16/06488-3 - Integrative epigenomic analysis of high and low Glioma-CpG island Methylator Phenotype (G-CIMP): characterization and methods development, BP.DD
14/08321-3 - Identification and characterization of functional genomic elements associated with progression of low to high grade glioma: integrative study of genome and epigenome, BP.PD
14/02245-3 - Identification of epigenomic signatures that define open chromatin regulatory networks associated with mesenchymal differentiation from human pluripotent stem cells, BP.PD


Genomic and epigenomic features in coding and non-coding DNA have recently been uncovered through advancements in DNA sequencing technologies. Large multi-national consortia (The Cancer Genome Atlas (TCGA), NIH Roadmap and ENCODE) who have spent millions of US dollars in hopes to advance our understanding of human genome across commonly used research cell-lines (e.g. MCF-7, HMEC, etc.), primary normal (e.g. human stem cells) and disease tissues (e.g. brain cancer). The multi-dimensional genomic data consists of more than 10,000 experiments (>100 terabases of data from 1000s of whole- genome, RNAseq, ChIPseq to Methyl-seq) profiled across more than 10,000 cell lines/tissues. All of these data have been deposited within the public domain, providing an invaluable resource for research laboratories, because it allows one to compare and contrast the genomic and epigenomic features to their own sequencing experiments. Despite its prominent availability, the data are deposited in different repositories and format making it a challenge to locate and identify relevant features. Many novice- advanced computational researchers, including our own team, have successfully harnessed some of these freely available data and through advanced integration and scientific insight enable the identification of biologically-relevant epigenomic changes (Berman et al. Nature Genetics 2012, Coetzee et al. Nucleic Acids Research 2012 and Noushmehr et al. Springer 2013). However, among the many issues facing most researchers is the lack of proper bioinformatic tools or skills to effectively integrate their sequencing data with these invaluable biological sequencing data. In partnership with our national collaborators (Life Science/Health co-PI), we will generate more than 200 methylomic and transcriptomic data. With our international collaborators we will develop automated tools for unifying the various gene regulatory databases, and develop powerful yet user-friendly methylation pipelines using the open/source R/Bioconductor structure, and web-based Rstudio Shiny system. Standard workflows will use the methods we have developed for the TCGA, Roadmap and ENCODE project to import and analyze large numbers of raw methylation data files from either the Illumina Infinium or Bisulfite-seq platforms. We will also allow import of arbitrary sample metadata so users can perform two-way or multi-way comparisons between cancer subtypes or clinical covariates. Our workflows will be driven by the most current understanding of the chromatin landscape, which includes using histone modifications and DNase hypersensitivity data to define focal chromatin state. Recent work by our lab and others suggests that methylation changes at cis- regulatory elements such as enhancers and insulators are driven primarily by binding of individual transcription factors, and thus reflect direct targeting of genes by specific transcriptional networks. We will use combined ChIP-seq and DNA binding motif analyses available from ENCODE to analyze user methylation data at the level of the individual protein/DNA interaction site. Finally, because the success of this effort will be measured by the degree of adoption within the cancer genomics community, we will engage several large-scale cancer genomics groups to act as beta testers and help us improve our workflows. (AU)

Articles published in Agência FAPESP about the research grant
Biomarker panel can guide treatment of brain cancer 
Index measures similarity between cancer cells and pluripotent stem cells 
Study describes new glioma subtypes 
Distribution map of accesses to this page
Click here to view the access summary to this page.