Genotyping Polyploids from Messy Sequencing Data

Gerard, David; Ferrao, Luis Felipe Ventorim; Franco Garcia, Antonio Augusto; Stephens, Matthew

Texto completo
Autor(es):	Gerard, David ^[1] ; Ferrao, Luis Felipe Ventorim ^[2] ; Franco Garcia, Antonio Augusto ^[3] ; Stephens, Matthew ^{[4, 5]} Número total de Autores: 4
Afiliação do(s) autor(es):	^[1] Amer Univ, Dept Math & Stat, 3501 Nebraska Ave NW, Don Myers Bldg, Washington, DC 20016 - USA ^[2] Univ Florida, Hort Sci Dept, Gainesville, FL 32611 - USA ^[3] Univ Sao Paulo, Luiz de Queiroz Coll Agr, Dept Genet, BR-13418900 Piracicaba - Brazil ^[4] Univ Chicago, Dept Human Genet, Chicago, IL 60637 - USA ^[5] Univ Chicago, Dept Stat, Chicago, IL 60637 - USA Número total de Afiliações: 5
Tipo de documento:	Artigo Científico
Fonte:	Genetics; v. 210, n. 3, p. 789-807, NOV 2018.
Citações Web of Science:	12
Resumo
Detecting and quantifying the differences in individual genomes (i.e., genotyping), plays a fundamental role in most modern bioinformatics pipelines. Many scientists now use reduced representation next-generation sequencing (NGS) approaches for genotyping. Genotyping diploid individuals using NGS is a well-studied field, and similar methods for polyploid individuals are just emerging. However, there are many aspects of NGS data, particularly in polyploids, that remain unexplored by most methods. Our contributions in this paper are fourfold: (i) We draw attention to, and then model, common aspects of NGS data: sequencing error, allelic bias, overdispersion, and outlying observations. (ii) Many datasets feature related individuals, and so we use the structure of Mendelian segregation to build an empirical Bayes approach for genotyping polyploid individuals. (iii) We develop novel models to account for preferential pairing of chromosomes, and harness these for genotyping. (iv) We derive oracle genotyping error rates that may be used for read depth suggestions. We assess the accuracy of our method in simulations, and apply it to a dataset of hexaploid sweet potato (Ipomoea batatas). An R package implementing our method is available at https://cran.r-project.org/package=updog. (AU)

Processo FAPESP:	14/20389-2 - Desenvolvimento de modelos genético-estatísticos para seleção genômica em Coffea canephora e outras espécies vegetais
Beneficiário:	Luís Felipe Ventorim Ferrão
Modalidade de apoio:	Bolsas no Brasil - Doutorado

URL curto