R Language Modules

 

Updated: April 26, 2017

Support for the R language in Azure Machine Learning makes it easier than ever to publish R models in production, and to use the experience of the R language community to solve real-world problems.

Before using R script in Azure Machine Learning Studio, be sure to understand the following requirements:

  • If you imported data that uses CSV or other formats, you cannot read the data directly in CSV format. Instead, use Convert to Dataset to prepare the data before using it as input to an R module.

  • When you attach any Azure ML dataset as input to an R module, the dataset is automatically loaded into the R workspace as a data frame, with the variable name, dataset. However, you can define additional data frames, or change the name of the default dataset variable within your R script.

  • The R modules run in a protected and isolated environment within your private workspace. Within your workspace, you can create data frames and variables for use by multiple modules.

    However, you cannot load R data frames from a different workspace or read variables created in a different workspace, even if that workspace is open in an Azure session. Also, you cannot use modules that have a Java dependency, or that require direct network access.

  • The implementation of R in the Azure Machine Learning Studio and workspace environment includes two principal components: one that coordinates script execution, and one that provides high-speed data access and scoring. The scoring component has been optimized to enhance scalability and performance.

    Therefore, R workspaces in Azure Machine Learning Studio also support two kinds of scoring tasks, each optimized for different requirements: scoring on a file-by-file basis is typically used when building an experiment, and the request response service (RRS) for very fast scoring is typically used when scoring as part of a web service.

Azure Machine Learning Studio includes over 500 of the most popular R packages. The R packages that you can select from depend on which R version you select for your experiment:

  • CRAN R
  • Microsoft R Open (MRO 3.2.2)

Whenever you create an experiment, you must choose a single R version to run on, for all modules in your experiment.

For a list of the packages that are currently supported in Azure Machine Learning, see R Packages Supported by Azure Machine Learning.

You can also add the following code to an Execute R Script module in your experiment, and run it to get a dataset containing package names and versions. Be sure to set the R version in the module properties to generate the correct list for your intended environment.

data.set <- data.frame(installed.packages())
maml.mapOutputPort("data.set")

The packages that are supported in Studio change frequently. If you have any doubts about whether an R package is supported, use the R code sample provided to get the complete list of packages in the current environment.

There are many ways that you can extend your experiment by using custom R script or by adding R packages. Here are some ideas to get you started.

  • Use R code to perform custom math operations. For example, there are R packages to solve differential equations, generate random numbers, or run Monte Carlo simulations.

  • Apply custom transformations for data. For example, you might use an R package to perform interpolation on time series data, or perform linguistic analysis.

  • Work with different data sources. The Execute R Script module support an additional set of inputs, which can include data files, in zipped format. You might use zipped data files along with R packages designed for such data sources, to flatten hierarchical data into a flat data table, read data from Excel and other file formats.

  • Use custom metrics for evaluation. For example, rather than use the functions provided in Evaluate, you could import an R package and then apply its metrics.

The following example demonstrates the overall process for how you can install new packages and use custom R code in your experiment.

Splitting Columns by using R

Sometimes the data requires extensive manipulation to extract features. Suppose you have a text file that contains an ID followed by values and notes, all separated by spaces. Or that your text file contains characters that are not supported by Studio.

There are several R packages that provide specialized functions for such tasks. The splitstackshape library package contains several useful functions for splitting multiple columns, even if each column has a different delimiter.

The following sample illustrates how to install the needed packages and split apart columns.

#install dependent packages  
install.packages("src/concat.split.multiple/data.table_1.9.2.zip", lib=".", repos = NULL, verbose = TRUE)  
(success.data.table <- library("data.table", lib.loc = ".", logical.return = TRUE, verbose = TRUE))  
  
install.packages("src/concat.split.multiple/plyr_1.8.1.zip", lib=".", repos = NULL, verbose = TRUE)  
(success.plyr <- library("plyr", lib.loc = ".", logical.return = TRUE, verbose = TRUE))  
  
install.packages("src/concat.split.multiple/Rcpp_0.11.2.zip", lib=".", repos = NULL, verbose = TRUE)  
(success.Rcpp <- library("Rcpp", lib.loc = ".", logical.return = TRUE, verbose = TRUE))  
  
install.packages("src/concat.split.multiple/reshape2_1.4.zip", lib=".", repos = NULL, verbose = TRUE)  
(success.reshape2 <- library("reshape2", lib.loc = ".", logical.return = TRUE, verbose = TRUE))  
  
#install actual packages  
install.packages("src/concat.split.multiple/splitstackshape_1.2.0.zip", lib=".", repos = NULL, verbose = TRUE)  
(success.splitstackshape <- library("splitstackshape", lib.loc = ".", logical.return = TRUE, verbose = TRUE))  
  
#Load installed library  
library(splitstackshape)  
  
#Use library method to split & concat  
data <- concat.split.multiple(maml.mapInputPort(1), c("TermsAcceptedUserClientIPAddress", "EmailAddress"), c(".", "@"))  
  
#Print column names to console  
colnames(data)  
  
#Redirect data to output port  
maml.mapOutputPort("data")  

Begin with this tutorial that describes how to build a custom R module.

This article discusses the differences between the two scoring engines in detail, and explains how you can choose a scoring method when you deploy your experiment as a web service.

This Gallery experiment demonstrates how you can create a custom R module that does training, scoring, and evaluation.

This article, published on R-Bloggers, demonstrates how you can create your own evaluation method in Azure Machine Learning.

More Help with R

This R documentation site provides a categorized list of packages that you can search by keywords:

For additional R code samples and help with R and its applications, see these resources:

The Modules References.R Language Modules category includes the following modules:

ModuleDescription
Execute R ScriptExecutes an R script from an Azure Machine Learning experiment
Create R ModelCreates an R model using custom resources

Python Language Modules
Module Categories and Descriptions

Show: