Skip to contents

Information about models that can be used in the predictive models are stored in Debian Control Files (dcf). This is the similar to the format used in RMarkdown YAML (i.e. metadata).

Usage

read_ml_datasets(
  dir = c(paste0(find.package("mldash"), "/datasets")),
  cache_dir = dir,
  pattern = "*.dcf",
  use_cache = TRUE,
  check_for_missing_packages = interactive()
)

Arguments

dir

directory containing the dcf files for the datasets.

pattern

optional regular expression that is used when finding files to read in. It defaults to all dcf files in the dir, but could be a single filename to test a metadata file.

use_cache

whether to read data from the cache if available. If FALSE, then the data will be retrieved from the data function parameter.

check_for_missing_packages

if TRUE you will be prompted to install missing packages.

data_cache

directory where rds data files will be stored.

Value

a data frame with the following fields:

  • idThe filename of the dataset.

  • title*The name of the dataset from the dcf file.

  • type*Whether this is for a regression or classification model.

  • descriptionDescription of the dataset.

  • sourceThe source of the dataset.

  • referenceReference for the dataset (APA format preferred).

  • model*The model formula used for the predictive model.

  • noteAny additional information.

  • denotes required fields.

Details

  • name*The name of the dataset.

  • type*Whether this is for a regression, classification, timeseries, or spatial model.

  • descriptionDescription of the dataset.

  • sourceThe source of the dataset.

  • referenceReference for the dataset (APA format preferred).

  • data*An R function that returns a data.frame.

  • model*The formula used for the predictive model.

  • noteAny additional information.

  • denotes required fields.