Advanced search
Start date
Betweenand

Methodological advances in museomics and dynamic homology: Integrating assembly of historical DNA reads and tree-alignment

Grant number: 24/03494-9
Support Opportunities:Scholarships abroad - Research Internship - Doctorate (Direct)
Effective date (Start): September 01, 2024
Effective date (End): August 31, 2025
Field of knowledge:Biological Sciences - Zoology
Principal Investigator:Taran Grant
Grantee:Daniel Yudi Miyahara Nakamura
Supervisor: Ward Wheeler
Host Institution: Instituto de Biociências (IB). Universidade de São Paulo (USP). São Paulo , SP, Brazil
Research place: American Museum of Natural History, United States  
Associated to the scholarship:22/02789-0 - Lost biodiversity in the genomic age: contributions from historical DNA to the systematics of rare and extinct frogs, BP.DD

Abstract

Museomics has allowed the recovery of historical DNA (hDNA) from rare and extinct species, enabling otherwise intractable evolutionary problems to be resolved. However, challenges must be overcome in order to incorporate hDNA sequences into phylogenetic analyses. First, hDNA tends to be highly degraded, resulting in poor or ambiguous mapping to reference sequences. Consequently, different reference sequences and assembly parameters can result in different consensus assemblies, and no clear criteria exist to choose among them. Second, DNA degradation makes it difficult to attain adequate coverage to assemble contiguous strings of DNA, which results in assembled sequences that are both more fragmentary and ambiguous (i.e. contain more nucleotides of unknown identity, IUPAC "N") than modern DNA. These characteristics hinder phylogenetic analysis by obscuring the homology relationships among nucleotides of hDNA and non-hDNA terminals. Tree-alignment, which evaluates nucleotide homology dynamically by integrating alignment and tree-searching into a single analysis in POY/PhyG, is a powerful approach that results in more optimal solutions than similarity-alignment. However, the high degree of fragmentation and spurious strings of Ns are especially problematic for tree-alignment because all nucleotide positions (even Ns) and variations in sequence length are necessarily interpreted as real features of the sequences that must be accounted for as evolutionary events. It is therefore necessary to eliminate sequencing and assembly artifacts prior to tree-alignment. The problem of sequence fragmentation can be resolved by breaking contiguous sequences into blocks of homologous nucleotides and/or inserting Ns to extend incomplete sequences, and/or deleting orphan nucleotides within incomplete blocks. Similarly, sequence length variation due to the erroneous insertion of Ns during assembly can be solved manually by trimming spurious Ns on the basis of preliminary alignments with close relatives. However, both the arbitrariness and impact of sequence formatting increases with the degree of fragmentation and ambiguity. To date, no objective function has been proposed to obtain optimally formatted sequences for phylogenetic analyses. As such, the purpose of this internship is to develop and test a pipeline to call the consensus sequence of hDNA reads to correct errors and incorporate assembled hDNA sequences into files formatted automatically for POY/PhyG by optimizing the insertion of breaks and Ns and deletion of putatively spurious Ns and orphan nucleotides within blocks.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Please report errors in scientific publications list using this form.