Análise de redes biológicas: estudo comparativo de medidas de dependência e uma ferramenta computacional para discriminar grafos

Suzana de Siqueira Santos

Full text
Author(s):	Suzana de Siqueira Santos Total Authors: 1
Document type:	Master's Dissertation
Press:	São Paulo.
Institution:	Universidade de São Paulo (USP). Instituto de Matemática e Estatística (IME/SBI)
Defense date:	2015-04-28
Advisor:	André Fujita
Abstract
Complex networks of molecular interactions describe the cellular phenotype. Therefore, identifying network properties that are dierent between healthy and diseased cellular state may elucidate the mechanisms that are involved in a disease. Studies of that kind of network usually analyze data from part of the population. Thus, statistical inference methods are fundamental to study biological networks. In this work, we focus on the analysis of co-expression graphs, in which the vertices correspond to genes and the edges indicate statistical associati- ons between the gene expression levels. In the rst part of this work, we present a comparative study of statistical dependence measures used to construct co-expression graphs. We have performed simulation experiments and applications of the methods on microarray data from tumor tissues to evaluate the strengths and limitations of the studied measures (the Pearson's correlation coecient, the Spearman's correlation coecient, the Kendall's correlation coecient, the distance correlation, the Heller-Heller-Gorne measure, the Hoeding's D measure, the mutual information, and the maximum information coecient). In the second part of the work, we have developed statistical tests to compare structural properties of co-expression graphs. To characterize a graph, we used complex network measures, such as the degree centrality, the betweenness centrality, the closeness centrality, the eigenvector centrality and the clustering coecient, and two recently proposed measures that are based on the graph spectrum (set of eigenvalues of the graph adjacency matrix). A motivation to use the spectrum of a graph is based on the fact that it describes several structural properties of a graph and is considered a more complete graph characterization than the usual complex network measures. The spectrum-based measures used in this work are the spectral entropy (measure of the graph randomness), and the Jensen-Shannon divergence between the distributions of the graph spectra. To make the proposed methods available, we have developed an R package called CoGA (Co-expression Graph Analyzer). We illustrate an ap- plication of the CoGA package on microarray data from two types of brain tumor. We show by simulation experiments that the proposed tests control the false positive rate and that their power is proportional to the number of changes in the network. Our results suggest that the CoGA package may be useful for the identication of gene sets associated with a disease. (AU)

FAPESP's process:	12/25417-9 - Development of statistical and computational methods for the analysis of graphs with applications in biological networks
Grantee:	Suzana de Siqueira Santos
Support Opportunities:	Scholarships in Brazil - Master

Short URL