Skip to contents

Introduction

Configuring a workstation to run mldash requires installing a few major components: R; Python; Java. Of course, mldash runs tidymodels natively in R. However, mldash will also run Python (e.g. Prophet) & Java (Weka) models so this guide will outline steps on how to build a working environment.

Depending on your operating system, you can use various system tools or package managers to ease the installation process.

At the end of this guide, you should have a workstation meets the following goals:

  • Runs R and R Studio
  • Has all the required R library dependencies installed
  • R pointing to a functioning Python environment with dependencies
  • R pointing to a functioning Java run time environment

Note: these directions have been tested on macOS 12.6 (Monterey on a M1 MacBook, ARM processor) and Red Hat Enterprise Linux (RHEL) 9. These steps have not been tested on other platforms.


Installing R & RStudio

You can download and install the R binary from the Crane Project home page. Next you’ll need to install RStudio, as your integrated development environment.

macOS

In an RStudio session, you can install required libraries in the console window:

install.packages("glmnet")
install.packages("brulee")
install.packages("fastDummies")
install.packages("kknn")
install.packages("plsmod")
install.packages("remotes")
install.packages("baguette")
install.packages("libcoin")
install.packages("earth")
install.packages("dbarts")
install.packages("xgboost")
install.packages("forecast")
install.packages("modeltime")

Installing the mix0mics Library

To install the mix0mics library, you need to install the BiocManager package.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("mixOmics")

Red Hat Enterprise Linux

Installing R dependencies is very similar:

install.packages(c("baguette","libcoin","earth","dbarts","xgboost","forecast","modeltime","glmnet","brulee","fastDummies","kknn","plsmod"))

install.packages(c("BiocManager"),type="binary")

install.packages(c("poissonreg","pscl","ranger","kernlab","mda","discrim","sda","sparsediscrim","klaR","LiblineaR","naivebayes","rules","MASS","baguette"),type="binary")


install.packages(c("dbarts","discrim","earth","fastDummies","glmnet","keras","kernlab","kknn","klaR","mda","mgcv","mixOmics","naivebayes","nnet","parsnip",
"plsmod","poissonreg","randomForest","ranger","rpart","rstanarm","rules","sda","sparsediscrim","stats","xgboost","xrf"),type="binary")

Installing Python - Method #1 with Miniconda

Many of the models will require Python which is executed using the reticulate package. I, personally, have found the installation and configuration of Python to be frustrating, especially on a Mac M1. However, as of this writing, the following works (on my system). First, install these packages from Github to ensure the latest version.

remotes::install_github(sprintf("rstudio/%s", c("reticulate", "tensorflow", "keras", "torch")))

If you have previously installed Miniconda, it is helpful to start from a clean slate.

reticulate::miniconda_uninstall()

We can then install Miniconda using the following command:

reticulate::install_miniconda()

Once installed, we can create a conda environment:

reticulate::conda_create("mldash")

And then make it active (note sure if it is necessary to do this for all three packages, but it doesn’t hurt):

reticulate::use_condaenv("mldash")
tensorflow::use_condaenv("mldash")
keras::use_condaenv("mldash")

Although there are utility functions to install keras, tensorflow, and torch from their respective packages, I found them to not always work as expected. The conda_install function will ensure the Python packages are installed into the correct environment. Note that as of this writing, pytorch still does not have a Mac M1 native version so some predictive models will not work on that platform.

reticulate::conda_install("mldash", 
              c("jupyterlab", "pandas", "statsmodels",
                "scipy", "scikit-learn", "matplotlib",
                "seaborn", "numpy", "pytorch", "tensorflow"))

Lastly, ensure that reticulate uses the correct Python by setting the RETICULATE_PYTHON environment variable (this can also be put in your .Renviron file to be used across sessions, though I avoid doing that so I can use different Python paths for different projects).

Sys.setenv("RETICULATE_PYTHON" = "~/miniforge3/envs/mldash/bin/python")

Installing Python - Method #2 with Mambaforge

On both macOS and RHEL, Python 3.x is pre-installed (either as python or python3) but you’ll really need a lot of Python modules. You can get the Mambaforge installer from (Mambaforge’s Github page)[https://github.com/conda-forge/miniforge#mambaforge].

Now you can use it to install the many model library dependencies.

mamba activate mldash
mamba install -y jupyterlab
mamba install -y pandas
mamba install -y statsmodels 
mamba install -y scipy
mamba install -y scikit-learn
mamba install -y matplotlib 
mamba install -y seaborn
mamba install -y numpy
mamba install -y pytorch
mamba install -y tensorflow

As in the previous method, set an environment variable for your Mamba-Python binary:

Sys.setenv("RETICULATE_PYTHON" = "~/mambaforge/envs/mldash/bin/python")

Installing Java (macOS)

You will need to install a Java version 8 development kit compiled for an ARM processor. One that I found to work is from Azul.

Follow the install directions and explicitly set an environment variable for JAVA_HOME, otherwise R will try to use the system’s default pre-installed Java virtual machine.

Sys.setenv(JAVA_HOME='/Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/')

Installing Java (RHEL)

In Linux, you can use the regular yum package manager and install OpenJDK version 8 on the command line.

$ sudo yum install -y java-1.8.0-openjdk

Unlike with macOS, you shouldn’t need to set a JAVA_HOME since Java is normally part of the system $PATH variable and normally linked to the correct version. Of course, if you have multiple Java versions, then go ahead and specify JAVA_HOME for safety.

Next Steps

Now that you have R, Python and Java installed can start Running Predictive Models.