Title: | Missing Data and Measurement Error Modelling in INLA |
---|---|
Description: | Facilitates fitting measurement error and missing data imputation models using integrated nested Laplace approximations, according to the method described in Skarstein, Martino and Muff (2023) <doi:10.1002/bimj.202300078>. See Skarstein and Muff (2024) <doi:10.48550/arXiv.2406.08172> for details on using the package. |
Authors: | Emma Skarstein [cre, aut, cph]
|
Maintainer: | Emma Skarstein <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.0 |
Built: | 2025-03-06 05:34:49 UTC |
Source: | https://github.com/emmaskarstein/inlamemi |
Extract model coefficients
## S3 method for class 'inlamemi' coef(object, ...)
## S3 method for class 'inlamemi' coef(object, ...)
object |
object of class 'inlamemi' |
... |
other arguments |
A wrapper function around "INLA::inla()", providing the necessary structure to fit the hierarchical measurement error model that adjusts coefficient estimates to account for biases due to measurement error and missing data.
fit_inlamemi( formula_moi, formula_imp = NULL, formula_mis = NULL, family_moi, data, error_type = "classical", error_variable = NULL, repeated_observations = FALSE, classical_error_scaling = NULL, prior.prec.moi = NULL, prior.prec.berkson = NULL, prior.prec.classical = NULL, prior.prec.imp = NULL, prior.beta.error = NULL, prior.gamma.error = NULL, initial.prec.moi = NULL, initial.prec.berkson = NULL, initial.prec.classical = NULL, initial.prec.imp = NULL, control.family.moi = NULL, control.family.berkson = NULL, control.family.classical = NULL, control.family.imp = NULL, control.family = NULL, control.predictor = NULL, ... )
fit_inlamemi( formula_moi, formula_imp = NULL, formula_mis = NULL, family_moi, data, error_type = "classical", error_variable = NULL, repeated_observations = FALSE, classical_error_scaling = NULL, prior.prec.moi = NULL, prior.prec.berkson = NULL, prior.prec.classical = NULL, prior.prec.imp = NULL, prior.beta.error = NULL, prior.gamma.error = NULL, initial.prec.moi = NULL, initial.prec.berkson = NULL, initial.prec.classical = NULL, initial.prec.imp = NULL, control.family.moi = NULL, control.family.berkson = NULL, control.family.classical = NULL, control.family.imp = NULL, control.family = NULL, control.predictor = NULL, ... )
formula_moi |
an object of class "formula", describing the main model to be fitted. |
formula_imp |
an object of class "formula", describing the imputation model for the mismeasured and/or missing observations. |
formula_mis |
an object of class "formula", describing the missingness model. Does not need to have a response variable, since this will always be a binary missingness indicator. |
family_moi |
a string indicating the likelihood family for the model of interest (the main model). |
data |
an object of class data.frame or list containing the variables in the model. |
error_type |
type of error (one or more of "classical", "berkson", "missing") |
error_variable |
character vector with the name(s) of the variable(s) with error. |
repeated_observations |
Does the variable with measurement error and/or missingness have repeated observations? If so, set this to "TRUE". In that case, when specifying the formula, use the name of the variable without any numbers, but when specifying the data, make sure that the repeated measurements end in a number, i.e "sbp1" and "sbp2". |
classical_error_scaling |
can be specified if the classical measurement error varies across observations. Must be a vector of the same length as the data. |
prior.prec.moi |
a string containing the parameters for the prior for the precision of the residual term for the model of interest. |
prior.prec.berkson |
a string containing the parameters for the prior for the precision of the error term for the Berkson error model. |
prior.prec.classical |
a string containing the parameters for the prior for the precision of the error term for the classical error model. |
prior.prec.imp |
a string containing the parameters for the precision of the latent variable x, which is the variable being described in the imputation model. |
prior.beta.error |
parameters for the Gaussian prior for the coefficient of the error prone variable. |
prior.gamma.error |
parameters for the Gaussian prior for the coefficient of the variable with missingness in the missingness model. |
initial.prec.moi |
the initial value for the precision of the residual term for the model of interest. |
initial.prec.berkson |
the initial value for the precision of the residual term for the Berkson error term. |
initial.prec.classical |
the initial value for the precision of the residual term for the classical error term. |
initial.prec.imp |
the initial value for the precision of the residual term for the latent variable r. |
control.family.moi |
control.family component for model of interest. Can be specified here using the inla syntax instead of passing the "prior.prec..." and "initial.prec..." arguments, or in the cases when other hyperparameters are needed for the model of interest, see for instance survival models. |
control.family.berkson |
control.family component Berkson model. Can be specified here using the inla syntax instead of passing the "prior.prec..." and "initial.prec..." arguments. Useful in the cases when more flexibility is needed, for instance if one wants to specify a different prior distribution than Gamma. |
control.family.classical |
control.family component for classical model. Can be specified here using the inla syntax instead of passing the "prior.prec..." and "initial.prec..." arguments. Useful in the cases when more flexibility is needed, for instance if one wants to specify a different prior distribution than Gamma. |
control.family.imp |
control.family component for imputation model. Can be specified here using the inla syntax instead of passing the "prior.prec..." and "initial.prec..." arguments. Useful in the cases when more flexibility is needed, for instance if one wants to specify a different prior distribution than Gamma. |
control.family |
control.family for use in inla (can be provided directly instead of passing the "prior.prec...." and "initial.prec..." arguments. If this is specified, any other "control.family..." or "prior.prec..." arguments provided will be ignored. |
control.predictor |
control.predictor for use in inla. |
... |
other arguments to pass to 'inla'. |
An object of class inlamemi
.
# Fit the model simple_model <- fit_inlamemi(data = simple_data, formula_moi = y ~ x + z, formula_imp = x ~ z, family_moi = "gaussian", error_type = c("berkson", "classical"), error_variable = "x", prior.prec.moi = c(10, 9), prior.prec.berkson = c(10, 9), prior.prec.classical = c(10, 9), prior.prec.imp = c(10, 9), prior.beta.error = c(0, 1/1000), initial.prec.moi = 1, initial.prec.berkson = 1, initial.prec.classical = 1, initial.prec.imp = 1)
# Fit the model simple_model <- fit_inlamemi(data = simple_data, formula_moi = y ~ x + z, formula_imp = x ~ z, family_moi = "gaussian", error_type = c("berkson", "classical"), error_variable = "x", prior.prec.moi = c(10, 9), prior.prec.berkson = c(10, 9), prior.prec.classical = c(10, 9), prior.prec.imp = c(10, 9), prior.beta.error = c(0, 1/1000), initial.prec.moi = 1, initial.prec.berkson = 1, initial.prec.classical = 1, initial.prec.imp = 1)
A data set with observations of heart disease status systolic blood pressure (SBP) and smoking status.
framingham
framingham
## 'framingham' A data frame with 641 rows and 4 columns:
A binary response, 1 if heart disease, 0 otherwise
log(SBP - 50) at examination 1 (centered)
log(SBP - 50) at examination 2 (centered)
Smoking status, 1 if smoking, 0 otherwise.
MacMahon et al. (1990) <https://doi.org/10.1016/0140-6736(90)90878-9>
Extract coefficients for the imputation model (IMP)
get_coef_imp(inlamemi_model)
get_coef_imp(inlamemi_model)
inlamemi_model |
object of class 'inlamemi' |
A data frame with a summary of the posterior marginals for the coefficients in the imputation model.
Extract coefficients for the missingness model (MIS)
get_coef_mis(inlamemi_model)
get_coef_mis(inlamemi_model)
inlamemi_model |
object of class 'inlamemi' |
A data frame with a summary of the posterior marginals for the coefficients in the missingness model.
Extract coefficients for the model of interest (MOI)
get_coef_moi(inlamemi_model)
get_coef_moi(inlamemi_model)
inlamemi_model |
object of class 'inlamemi' |
A data frame with a summary of the posterior marginals for the coefficients in the model of interest.
Extract imputed values
get_imputed(inlamemi_model, error_variable)
get_imputed(inlamemi_model, error_variable)
inlamemi_model |
object of class 'inlamemi' |
error_variable |
character string indicating the name of the error variable for which to extract the imputed values. |
A list of two objects: the posterior marginal distributions for each element of the imputed covariate, and a data frame giving a summary of these marginals.
A simulated dataset to demonstrate how to set up a model in the case where there are two variables with measurement error.
mar_data
mar_data
## 'mar_data' A data frame with 1000 rows and 5 columns:
Response variable
Observed value of covariate, with almost 20 percent missing
Correct version of x, without missingness
Covariate correlated with x
Covariate correlated with the missingness of x
The dataset is simulated.
A dataset containing a repeated blood pressure measurement along with some other variables for participants in the Third National Health and Nutrition Survey (NHANES III), merged with data from the US National Death Index by Ruth H. Keogh and Jonathan Bartlett. For the illustration purposes in this package, we have left out observations where smoking status is missing.
nhanes_survival
nhanes_survival
## 'nhanes_survival' A data frame with 3433 rows and 8 columns:
systolic blood pressure (standardized), first measurement
systolic blood pressure (standardized), second measurement
sex (0 = female, 1 = male)
age (standardized)
smoking status (0 = no, 1 = yes)
diabetes status (0 = no, 1 = yes)
censoring status (0 = censored, 1 = observed death due to cardiovascular disease)
time until death due to cardiovascular disease occurs
https://github.com/ruthkeogh/meas_error_handbook
Plot model summary
## S3 method for class 'inlamemi' plot( x, plot_moi = TRUE, plot_imp = TRUE, plot_mis = TRUE, plot_intercepts = TRUE, error_variable_highlight = FALSE, greek = FALSE, palette = NULL, ... )
## S3 method for class 'inlamemi' plot( x, plot_moi = TRUE, plot_imp = TRUE, plot_mis = TRUE, plot_intercepts = TRUE, error_variable_highlight = FALSE, greek = FALSE, palette = NULL, ... )
x |
the model returned from the fit_inlamemi function. |
plot_moi |
should the posterior mean for the coefficients of the model of interest be plotted? Defaults to TRUE. |
plot_imp |
should the posterior mean for the coefficients of the imputation model be plotted? Defaults to TRUE. |
plot_mis |
should the posterior mean for the coefficients of the missingness model be plotted? Defaults to TRUE. |
plot_intercepts |
should the posterior mean for the intercept(s) be plotted? Defaults to TRUE. |
error_variable_highlight |
should the coefficient(s) of the variable(s) with error be highlighted? (circled in black) Defaults to FALSE. |
greek |
make the coefficient names into greek letters with the covariate name as subscript. Defaults to FALSE. |
palette |
either a number (between 1 and 5), indicating the number of the color palette to be used, or a vector of the colors to be used. |
... |
other arguments |
An object of class "ggplot2" that plots the posterior mean and 95 % credible interval for each coefficient in the model. The coefficients are colored to indicate if they belong to the main or imputation model, and the variable with error is also highlighted.
# Fit the model simple_model <- fit_inlamemi(data = simple_data, formula_moi = y ~ x + z, formula_imp = x ~ z, family_moi = "gaussian", error_type = c("berkson", "classical"), prior.prec.moi = c(10, 9), prior.prec.berkson = c(10, 9), prior.prec.classical = c(10, 9), prior.prec.imp = c(10, 9), prior.beta.error = c(0, 1/1000), initial.prec.moi = 1, initial.prec.berkson = 1, initial.prec.classical = 1, initial.prec.imp = 1) plot(simple_model)
# Fit the model simple_model <- fit_inlamemi(data = simple_data, formula_moi = y ~ x + z, formula_imp = x ~ z, family_moi = "gaussian", error_type = c("berkson", "classical"), prior.prec.moi = c(10, 9), prior.prec.berkson = c(10, 9), prior.prec.classical = c(10, 9), prior.prec.imp = c(10, 9), prior.beta.error = c(0, 1/1000), initial.prec.moi = 1, initial.prec.berkson = 1, initial.prec.classical = 1, initial.prec.imp = 1) plot(simple_model)
Print method for inlamemi
## S3 method for class 'inlamemi' print(x, ...)
## S3 method for class 'inlamemi' print(x, ...)
x |
object of class 'inlamemi'. |
... |
other arguments. |
Visualize the model data structure as matrices in LaTeX
show_data_structure(stack)
show_data_structure(stack)
stack |
an object of class inla.stack returned from the function make_inlamemi_stacks, which describes the structure of the data for the measurement error and imputation model. |
A list containing data frames with the left hand side (response_df) and right hand side (effects_df), along with the latex code needed to visualize the matrices (matrix_string).
stack <- make_inlamemi_stacks(data = simple_data, formula_moi = y ~ x + z, formula_imp = x ~ z, error_type = "classical") show_data_structure(stack)
stack <- make_inlamemi_stacks(data = simple_data, formula_moi = y ~ x + z, formula_imp = x ~ z, error_type = "classical") show_data_structure(stack)
A simulated dataset to demonstrate how to model different types of measurement error and missing data using the 'inlamemi' package.
simple_data
simple_data
## 'simple_data' A data frame with 1000 rows and 4 columns:
Response variable
Covariate measured with error, both Berkson and classical error and missing observations
Correct version of the covariate with error
Error free covariate, correlated with x
The dataset is simulated.
Takes a fitted 'inlamemi' object produced by 'fit_inlamemi' and produces a summary from it.
## S3 method for class 'inlamemi' summary(object, ...) ## S3 method for class 'summary.inlamemi' print(x, ...)
## S3 method for class 'inlamemi' summary(object, ...) ## S3 method for class 'summary.inlamemi' print(x, ...)
object |
model of class 'inlamemi'. |
... |
other arguments |
x |
object of class summary.inlamemi. |
'summary.inlamemi' returns an object of class 'summary.inlamemi', a list of components to print.
# Fit the model simple_model <- fit_inlamemi(data = simple_data, formula_moi = y ~ x + z, formula_imp = x ~ z, family_moi = "gaussian", error_type = c("berkson", "classical"), prior.prec.moi = c(10, 9), prior.prec.berkson = c(10, 9), prior.prec.classical = c(10, 9), prior.prec.imp = c(10, 9), prior.beta.error = c(0, 1/1000), initial.prec.moi = 1, initial.prec.berkson = 1, initial.prec.classical = 1, initial.prec.imp = 1) summary(simple_model)
# Fit the model simple_model <- fit_inlamemi(data = simple_data, formula_moi = y ~ x + z, formula_imp = x ~ z, family_moi = "gaussian", error_type = c("berkson", "classical"), prior.prec.moi = c(10, 9), prior.prec.berkson = c(10, 9), prior.prec.classical = c(10, 9), prior.prec.imp = c(10, 9), prior.beta.error = c(0, 1/1000), initial.prec.moi = 1, initial.prec.berkson = 1, initial.prec.classical = 1, initial.prec.imp = 1) summary(simple_model)
A simulated dataset to demonstrate how to set up a model in the case where there are two variables with measurement error.
two_error_data
two_error_data
## 'two_error_data' A data frame with 1000 rows and 5 columns:
Response variable
Covariate measured with classical error, correlated with z
Covariate measured with classical error
Correct version of x1
Correct version of x2
Error free covariate, correlated with x1
The dataset is simulated.