Advanced search
Start date
Betweenand


Biological Sequence Analysis Using Complex Networks and Entropy Maximization: A Case Study in SARS-CoV-2

Full text
Author(s):
Pimenta-Zanon, Matheus H. ; De Souza, Vinicius Augusto ; Hashimoto, Ronaldo Fumio ; Lopes, Fabricio Martins ; Swarnkar, T ; Patnaik, S ; Mitra, P ; Misra, S ; Mishra, M
Total Authors: 9
Document type: Journal article
Source: AMBIENT INTELLIGENCE IN HEALTH CARE, ICAIHC 2022; v. 317, p. 10-pg., 2023-01-01.
Abstract

During the COVID-19 pandemic, several genetic mutations occurred in the SARS-CoV-2 virus, making more infectious or transmissible. The World Health Organization (WHO) tracks and classifies variants as variants of concern (VOCs) or variants of interest (VOIs), depending on the level of transmissibility and dominance of the variant in the regions. The classification and identification of variants usually occur through sequence alignment techniques, which are computationally complex, making them unfeasible to classify thousands of sequences simultaneously. In this work, an application of the alignment-free method BASiNETEntropy is proposed for the classification of the variants of concern of SARS-CoV-2. The method initially maps the biological sequences as a complex network. From this, the most informative edges are selected through the entropy maximization principle, getting a filtered network containing only the most informative edges. Thus, complex network topological measurements are extracted and used as features vectors in the classification process. Sequences of SARS-CoV-2 variants of concern extracted from NCBI were used to assess the method. Experimental results show that extracted features can classify the variants of concern with high assertiveness, considering few features, contributing to the reduction of the feature space. Besides classifying the variants of concern, unique patterns (motifs) were also extracted for each variant, relative to the SARS-CoV-2 reference sequence. The proposed method is implemented as an open source in R language and is freely available at https://cran.r-project.org/web/packages/BASiNETEntropy/. (AU)

FAPESP's process: 15/22308-2 - Intermediate representations in Computational Science for knowledge discovery
Grantee:Roberto Marcondes Cesar Junior
Support Opportunities: Research Projects - Thematic Grants