Advanced search
Start date
Betweenand
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

Evidence of absence treated as absence of evidence: The effects of variation in the number and distribution of gaps treated as missing data on the results of standard maximum likelihood analysis

Full text
Author(s):
Machado, Denis Jacob [1, 2] ; Castroviejo-Fisher, Santiago [3] ; Grant, Taran [4]
Total Authors: 3
Affiliation:
[1] Univ North Carolina Charlotte, Coll Comp & Informat, Dept Bioinformatcis & Genom, 9201 Univ City Blvd, Charlotte, NC 28223 - USA
[2] Univ Sao Paulo, Programa Interunidades Posgrad Bioinformat, Rua Matao 1010, BR-05508090 Sao Paulo, SP - Brazil
[3] Pontificia Univ Catolica Rio Grande do Sul, Lab Sistemat Vertebrados, Ave Ipiranga 6681, Predio 12, BR-90619900 Porto Alegre, RS - Brazil
[4] Univ Sao Paulo, Lab Anfibios, Inst Biociencias, Dept Zool, Rua Matao, Tv 14, 101 Cidade Univ, BR-05508090 Sao Paulo, SP - Brazil
Total Affiliations: 4
Document type: Journal article
Source: Molecular Phylogenetics and Evolution; v. 154, JAN 2021.
Web of Science Citations: 1
Abstract

Although numerous studies have demonstrated the theoretical and empirical importance of treating gaps as insertion/deletion (indel) events in phylogenetic analyses, the standard approach to maximum likelihood (ML) analysis employed in the vast majority of empirical studies codes gaps as nucleotides of unknown identity ({''}missing data{''}). Therefore, it is imperative to understand the empirical consequences of different numbers and distributions of gaps treated as missing data. We evaluated the effects of variation in the number and distribution of gaps (i.e., no base, coded as IUPAC ``.{''} or ``-{''}) treated as missing data (i.e., any base, coded as ``?{''} or IUPAC ``N{''}) in standard ML analysis. We obtained alignments with variable numbers and arrangements of gaps by aligning seven diverse empirical datasets under different gap opening costs using MAFFT. We selected the optimal substitution model for each alignment using the corrected Akaike Information Criterion in jModelTest2 and searched for optimal trees using GARLI. We also employed a Monte Carlo approach to randomly replace nucleotides with gaps (treated as missing data) in an empirical dataset to understand more precisely the effects of varying their number and distribution. To compare alignments, we developed four new indices and used several existing measures to quantify the number and distribution of gaps in all alignments. Our most important finding is that ML scores correlate negatively with gap opening costs and the amount of missing data. However, this negative relationship is not due to the increase in missing data per se-which increases ML scores-but instead to the effect of gaps on nucleotide homology. These variables also cause significant but largely unpredictable effects on tree topology. (AU)

FAPESP's process: 12/10000-5 - A multi-disciplinary approach to the study of amphibian diversification
Grantee:Taran Grant
Support type: Research Grants - Young Investigators Grants
FAPESP's process: 15/18654-2 - Whole-genome sequence of the eastern spadefoot toad, Scaphiopus holbrookii (Amphibia: Anura: Scaphiopodidade) and of the Maldonado redbelly toad, Melanophryniscus moreirae (Amphibia: Anura: Bufonidae)
Grantee:Denis Jacob Machado
Support type: Scholarships abroad - Research Internship - Doctorate
FAPESP's process: 18/15425-0 - A multi-disciplinary approach to the study of amphibian diversification: phase 2
Grantee:Taran Grant
Support type: Research Grants - Young Investigators Grants - Phase 2