Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Casboundary: automated definition of integral Cas cassettes

Texto completo
Autor(es):
Padilha, Victor A. [1] ; Alkhnbashi, Omer S. [2] ; Tran, Van Dinh [2] ; Shah, Shiraz A. [3] ; Carvalho, Andre C. P. L. F. [1] ; Backofen, Rolf [2, 4, 5]
Número total de Autores: 6
Afiliação do(s) autor(es):
[1] Univ Sao Paulo, Inst Math & Comp Sci, BR-13566590 Sao Carlos, SP - Brazil
[2] Univ Freiburg, Dept Comp Sci, Bioinformat Grp, D-79110 Freiburg - Germany
[3] Copenhagen Univ Hosp Herlev & Gentofte, COPSAC, DK-2820 Gentofte - Denmark
[4] Univ Freiburg, Signalling Res Ctr BIOSS, D-79104 Freiburg - Germany
[5] Univ Freiburg, CIBSS, D-79104 Freiburg - Germany
Número total de Afiliações: 5
Tipo de documento: Artigo Científico
Fonte: Bioinformatics; v. 37, n. 10, p. 1352-1359, MAY 15 2021.
Citações Web of Science: 1
Resumo

Motivation: CRISPR-Cas are important systems found in most archaeal and many bacterial genomes, providing adaptive immunity against mobile genetic elements in prokaryotes. The CRISPR-Cas systems are encoded by a set of consecutive cas genes, here termed cassette. The identification of cassette boundaries is key for finding cassettes in CRISPR research field. This is often carried out by using Hidden Markov Models and manual annotation. In this article, we propose the first method able to automatically define the cassette boundaries. In addition, we present a Cas-type predictive model used by the method to assign each gene located in the region defined by a cassette's boundaries a Cas label from a set of pre-defined Cas types. Furthermore, the proposed method can detect potentially new cas genes and decompose a cassette into its modules. Results: We evaluate the predictive performance of our proposed method on data collected from the two most recent CRISPR classification studies. In our experiments, we obtain an average similarity of 0.86 between the predicted and expected cassettes. Besides, we achieve F-scores above 0.9 for the classification of cas genes of known types and 0.73 for the unknown ones. Finally, we conduct two additional study cases, where we investigate the occurrence of potentially new cas genes and the occurrence of module exchange between different genomes. (AU)

Processo FAPESP: 13/07375-0 - CeMEAI - Centro de Ciências Matemáticas Aplicadas à Indústria
Beneficiário:Francisco Louzada Neto
Modalidade de apoio: Auxílio à Pesquisa - Centros de Pesquisa, Inovação e Difusão - CEPIDs
Processo FAPESP: 19/21300-9 - Ferramentas de aprendizado de máquina para problemas de bioinformática
Beneficiário:Victor Alexandre Padilha
Modalidade de apoio: Bolsas no Brasil - Doutorado