Advanced search
Start date
Betweenand


Categorical data analysis with missingness in explanatory and response variables

Full text
Author(s):
Frederico Zanqueta Poleto
Total Authors: 1
Document type: Doctoral Thesis
Press: São Paulo.
Institution: Universidade de São Paulo (USP). Instituto de Matemática e Estatística (IME/SBI)
Defense date:
Examining board members:
Julio da Motta Singer; Enrico Antônio Colosimo; Rosângela Helena Loschi; Fernando Antônio da Silva Moura; Carlos Daniel Mimoso Paulino
Advisor: Julio da Motta Singer
Abstract

We present methodological developments to conduct analyses with missing data and also studies designed to understand the results of such analyses. We examine Bayesian and classical sensitivity analyses for data with missing categorical responses and show that the subjective components of each approach can influence results in non-trivial ways, irrespectively of the sample size, concluding that they need to be carefully evaluated. Specifically, we show that prior distributions commonly regarded as slightly informative or non-informative may actually be too informative for non-identifiable parameters, and that the choice of over-parameterized models may drastically impact the results. When there is missingness in explanatory variables, we also need to consider a marginal model for the covariates even if the interest lies only on the conditional model. An incorrect specification of either the model for the covariates or of the model for the missingness mechanism leads to biased inferences for the parameters of interest. Previously published works are commonly divided into two streams: either they use semi-/non-parametric flexible distributions for the covariates and identify the model via a non-informative missingness mechanism, or they employ parametric distributions for the covariates and allow a more general informative missingness mechanism. We consider the analysis of binary responses, combining an informative missingness model with a non-parametric model for the continuous covariates via a Dirichlet process mixture. When the interest lies only in moments of the response distribution, we consider a new classical sensitivity analysis for incomplete responses that avoids distributional assumptions and employs easily interpreted sensitivity parameters. The procedure is particularly useful for analyses of missing continuous data, an area where normality is traditionally assumed and/or relies on hard-to-interpret sensitivity parameters. We illustrate all analyses with real data sets. (AU)