Advanced search
Start date

Investigation of alphoid satellite DNA loss in centromeric regions in general population and in pediatric tumors


The DNA sequence is now the main source of biological information at individual and population levels. Hundreds of thousands of human genomes were sequenced, but only a few were analyzed cytogenetically. Chromosomal alterations occur in a small percentage of all live newborns, usually causing diseases. However, with the advent of molecular cytogenetics and, more recently, second generation sequencing (NGS), enormous structural complexity was revealed in the human genome of healthy individuals, in the form of structural and/or quantitative rearrangements collectively referred to as structural variants (SV), which include deletions, duplications, inversions, insertions and translocations. The structural variants comprise a large part of the human genomes, having a great impact on the organization of sequences as well as repercussions on human health. Therefore, thousands of alternative genomic rearrangements are present in the genomes analyzed so far, composing the genetic diversity of our species. The identification of such structural variants in genomes derived from individuals with genetic diseases or known congenital anomalies is a more direct and widely used approach in medical genomics, but certainly thousands of variants remain to be detected. Some of the unidentified rearrangements may be apparently balanced reciprocal translocations (no gain or loss of detectable euchromatic genomic material) in asymptomatic individuals; another possibility is the occurrence of carriers of marker chromosomes composed only of repetitive sequences, which may also exist in asymptomatic persons.Part of the structural rearrangements may lead to the emergence of neo-centromeres which are assembled on sequences different from the centromeric alphoid DNA; constitutional human neo-centromeres have already been described, generally on marker chromosomes identified in children with developmental delay or congenital anomalies. In addition, neo-centromeres have also been identified in at least two human cancers. The proposal is based on the quantification of the number of reads mapping to the centromeric alphoid in proportion to the number of reads mapping to a set of chromosome specific centromere proximal sequences. The pipeline will be structured using human genomes available in public databases of individuals with no history of genetic diseases such as the Genome Aggregation Database (gnomAD - available at and 1000 Genomes Project (available at http: // Once we have determined the parameters for the population mean and variance for each of the chromosomal alphoid sequences in somatic cells we will analyse the many thousands of complete pediatric tumor genomes to identify tumors that may contain neo-centromeres. These pediatric tumor genomes will be obtained from public databases such as St. Judes Cloud (available at and Kids First (available at, as well as genomes from tumors of pediatric patients deposited in the laboratory of Dr. Ana Krepischi. (AU)