Advanced search
Start date

Methods for parameter estimation and model selection in compositional data regression


Compositional data consist of vectors which components are proportions or percentages of a whole, and arise very often in several areas. Their sample space is the Simplex, with quite distinct features in comparison with Euclidean Space. So, regression methods designed to unconstrained data may frequently provide inadequate inferencies when applied to compositional data. One approach for regression on compositional data is the Dirichlet regression, which assumes that the response vector follows a Dirichlet distribution D(a_1, a_2, ..., a_D). Given a covariate vector x = (x_1, x_2, ..., x_C), a regression model is promptly obtained, by considering each parameter a_j as a positive function of x, and thus obtaining a Dirichlet distribution conditional on x.The simplest case in this family is considering the uniform link function, where each parameter a_j is described by a linear combination of covariates, a_j(x) = b_1*x_1 + b_2*x_2 + ... + b_C*x_c. One current method for estimation of coefficients by maximum likelihood applies a probabilistic preliminary phase (based on resampling) for the search of an initial point in the feasible region. However, that method is numerically unstable and does not garantee a solution for every instance.In an article accepted for publication, resulting from a master thesis coordinated by this project proponent, we introduced a new approach for coefficients estimation. Our optimization method adopts a regularization approach, with the introduction of artificial variables in the search for initial solutions. In numerical experiments, our proposed method significantly outperforms the previous one, both on robustnes and computational performance aspects. In the same article, we proposed an aproach for the nullity test of parameters, based on the Full Bayesian Significance Test (FBST). Despite the reasonable performances obtained on some numerical experiments, we see the possibility of enhancing the convergency in numerical integration step. This task demands further research, implementation and analysis of Monte Carlo methos. Motivated by the good results already obtained, in this project we propose to continue our research on compositional data analysis via Dirichlet regression. Our topics of interest are the study of more complex link functions, numerical robust methods for parameter estimation based on regularization, and model selection based on FBST. (AU)