Reads model files for running predictive models.

Information about models that can be used in the predictive models are stored in Debian Control Files (dcf). This is the similar to the format used in RMarkdown YAML (i.e. metadata).

Usage

read_ml_datasets(
  dir = c(paste0(find.package("mldash"), "/datasets")),
  cache_dir = dir,
  pattern = "*.dcf",
  use_cache = TRUE,
  check_for_missing_packages = interactive()
)

Arguments

dir: directory containing the dcf files for the datasets.
pattern: optional regular expression that is used when finding files to read in. It defaults to all dcf files in the dir, but could be a single filename to test a metadata file.
use_cache: whether to read data from the cache if available. If FALSE, then the data will be retrieved from the data function parameter.
check_for_missing_packages: if TRUE you will be prompted to install missing packages.
data_cache: directory where rds data files will be stored.

Value

a data frame with the following fields:

idThe filename of the dataset.
title*The name of the dataset from the dcf file.
type*Whether this is for a regression or classification model.
descriptionDescription of the dataset.
sourceThe source of the dataset.
referenceReference for the dataset (APA format preferred).
model*The model formula used for the predictive model.
noteAny additional information.

denotes required fields.

Details

name*The name of the dataset.
type*Whether this is for a regression, classification, timeseries, or spatial model.
descriptionDescription of the dataset.
sourceThe source of the dataset.
referenceReference for the dataset (APA format preferred).
data*An R function that returns a data.frame.
model*The formula used for the predictive model.
noteAny additional information.

denotes required fields.