Data Format Conversions

 

Updated: June 14, 2017

This topic lists the modules provided in Azure Machine Learning Studio for converting data among various file formats used in machine learning.

The supported formats include:

  • The dataset format used throughout Azure Machine Learning
  • The ARFF format used by Weka, an open-source Java-based set of machine learning algorithms
  • The SVM Light format, which was developed for the SVMlight framework for machine learning, but can also be used by Vowpal Wabbit
  • The tab-separated and comma-separated flat file formats supported by most relational databases. These formats are also widely supported by R and Python.

By converting data to these formats, you can more easily move results and data between different machine learning frameworks or storage mechanisms.

System_CAPS_ICON_note.jpg Note

These data conversion modules only convert the complete dataset to a specified format. If you need to do any casting, truncation, conversion of date-time formats, or other manipulation of the values, use the modules in this section: Data Transformation, or see the list of related tasks.

You would typically use the modules for data conversion if you need to move data from an Azure Machine Learning experiment to another machine learning tool or platform, or if you need to export data from Azure Machine Learning in a format that can be used by a database or other tools. For example:

TaskUse this
An intermediate dataset must be saved for use in Excel, or for import to a databaseUse the CSV module or the TSV module to prepare the data in the correct format. Then, either download the data or save it to Azure storage.
You want to re-use data from your experiment in R or Python codeUse the CSV module or the TSV module to prepare the data. Then, right-click the converted dataset to get the Python code needed to access the dataset.
You are porting your experiment and data between Weka and Azure Machine LearningUse the ARFF module to prepare the data. Then, download the results.
You need to prepare data in the SVMLight frameworkUse the Convert to SVMLight module to prepare the data. Then, download the resulting data.
Create data for use with Vowpal WabbitUse the SVMLight format and then modify the files as described in the topic. Save the file in Azure blob storage to use with a Vowpal Wabbit module in Azure Machine Learning.
Data is not in a tabular formatCoerce it to a dataset format by using the Convert to Dataset module.

If you need to import data into Azure Machine Learning. or transform data in individual columns, use these modules before performing data conversion:

TaskUse this
Import data from my computer into Azure Machine LearningUpload datasets in CSV format as described in this article.
Import data from a cloud data source, including Hadoop or AzureUse the Import Data module.
Save machine learning datasets out to Azure blob storage, a Hadoop cluster, or other cloud-based storageUse the Export Data module.
Change the data type of columns; cast columns to a different format or typeIn Azure Machine Learning, use these modules: Edit Metadata, Apply SQL Transformation. If you are proficient with R or Python, try these modules: Execute Python Script, Execute R Script.
Round, group, or normalize numerical dataUse Apply Math Operation, Group Data into Bins, or Normalize Data

The Data Format Conversions category includes these modules:

ModuleDescription
Convert to ARFFConverts data input to the attribute relation file format used by the Weka toolset
Convert to CSVConverts a dataset to a comma-separated values format
Convert to DatasetConverts a data input to the internal Dataset format used by Microsoft Azure Machine Learning
Convert to SVMLightConverts data input to the format used by the SVM-Light framework
Convert to TSVConverts data input to the tab-delimited format

Data Transformation
Module Categories and Descriptions

Show: