Advanced search
Start date
Betweenand

Harmonization of whole genome sequencing data for gene-environment interaction analysis

Grant number: 24/06730-5
Support Opportunities:Scholarships in Brazil - Doctorate (Direct)
Start date: June 01, 2024
End date: May 31, 2028
Field of knowledge:Biological Sciences - Genetics - Human and Medical Genetics
Principal Investigator:Marcos Leite Santoro
Grantee:Pedro Henrique Destro
Host Institution: Escola Paulista de Medicina (EPM). Universidade Federal de São Paulo (UNIFESP). Campus São Paulo. São Paulo , SP, Brazil
Associated research grant:23/05560-6 - Genetic and epigenetic approaches as predictive models in mental disorders, AP.JP

Abstract

The development of mental disorders (MDs) is greatly influenced by genetic and environmental factors. The genetic variants involved in the progression of these disorders range in size, from single nucleotide variants (SNVs) to large structural variants, such as copy number variations (CNVs). New genomic technologies, such as whole genome sequencing (WGS), allow for a better understanding of the genetic bases of mental disorders. WGS has the potential to identify rare variants in approximately 99% of the non-coding genome. This technique is also capable of detecting most structural changes, such as translocations and inversions. Additionally, WGS can enhance the detection of common variants in existing large-scale genome-wide association studies (GWAS) through statistical methods to infer ungenotyped SNPs via imputation. Similarly, WGS data can enable the identification of structural variants, including CNVs, which may not be precisely identified by genotyping methods. However, handling this type of data remains a challenge. With WGS, the magnitude of data is considerably higher (~3 billion) compared to GWAS data (~20 million) and whole exome sequencing data (~30 million). In this project, the Brazilian High-Risk Cohort for Mental Disorders (BHRCS) and PUMASBrasil have already completed sequencing for over 16 thousand individuals. For this Direct Doctorate (DD) scholarship, the objective will be to harmonize the sequencing data from these two cohorts (sequenced at different sites) as well as integrate it with data from the UK Biobank (N~500,000 WGS). Bioinformatics techniques are applied for processing this type of data, including quality control of raw reads, data pre-processing, sample alignment, variant calling, genome assembly, genome annotation, and additional analyses as per research interest. The next necessary step for processing cohort sequencing data will be sample alignment. In the first two years of the project, the DD scholar will undertake these initial data processing and cohort integration steps. It is worth noting that in the objectives for the first two years, these processed data will also be used for other ongoing scholarships related to the genomic module (2022/16317-2, 2022/15880-5, 2024/02457-3, 2024/04224-5). From the third year onwards, the focus of the study will be on integrating the identified genetic variants with environmental exposure data available in the three cohorts, such as substance abuse, for instance. Thus, this project will directly contribute to the genomic module, as well as the integrative module (Objective 4.3.1 of the Young Researcher, see Figure 1.C representing environmental factors in green), allowing for a more precise elucidation of the remaining observed heritability of MDs. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)