Train models using different combinations of predictor variables based upon missing data patterns.

Usage

medley_train(
  data,
  formula,
  method = glm,
  var_sets = get_variable_sets(data = data, formula = formula, min_set_size =
    min_set_size),
  min_set_size = 0.1,
  exclusive_membership = TRUE,
  ...
)

# S3 method for class 'medley'
summary(object, ...)

# S3 method for class 'medley'
print(x, ...)

# S3 method for class 'medley'
predict(object, newdata, ...)

Arguments

data: data.frame used to estimate the models.
formula: with all possible predictor varaibles to be considered.
method: the function used to train the models (e.g. glm, randomForest).
var_sets: a list of formulas to use for the predictive models.
min_set_size: the minimum set size as a percentage to incldue as a model.
exclusive_membership: whether an observation should only be used only in the model for which the most predictor variables are available. If `FALSE` then observations may be used in training more than one model.
...: other parameters passed to the `predict()` function.
object: the results from `medley_train`.
x: the results of `medley_train`.
newdata: (optional) a new data.frame to get predictions for.

Value

an object with the following elements:

n_models: the number of models trained.
formulas: the list of formulas used to train the models.
models: list of objects returned from the training method.
data: the data.frame used to train the models.
model_observations: a data.frame that specifies which observations are used for which model(s).

a vector of predictions.