Convert to ARFF
Updated: March 2, 2017
Converts data input to the attribute relation file format used by the Weka toolset
Category: Data Format Conversions
You can use the Convert to ARFF module to convert datasets and results in Azure Machine Learning to the attribute-relation file format used by the Weka toolset. This format is known as ARFF.
The ARFF data specification for Weka supports multiple machine learning tasks, including data preprocessing, classification, and feature selection. In this format, data is organized by entites and their attributes, and is contained in a single text file. You can find details of the Weka file format in the Technical Notes section.
In general, conversion to the Weka file format is required only if you want to use both Azure Machine Learning and Weka, and intend to move your training data back and forth between them.
For more information about the Weka toolset, see this Wikipedia article: Weka (machine learning
You cannot overwrite an existing ARFF file in Azure Storage. |
Add the Convert to ARFF module to your experiment. You can find this module in the Data Format Conversions group in the experiment items list in Azure Machine Learning Studio.
Connect it to any module that outputs a dataset.
Run the experiment, or click the Convert to ARFF module, and click Run selected.
Double-click the output of Convert to ARFF, and select the Download option to create a copy of the data in a local folder. If you do not specify a folder, a default file name is applied and the file is saved in the local Downloads library.
This format does not support other options, such as export to Python or R code.
Although there are no examples specific to these formats, you can see examples of how format conversion is used by exploring these sample experiments in the Model Gallery:
The Color-Based Image Compression sample exports the datasets used for each portion of the analysis to files for reproducibility and use on other analytics platforms.
The Cross Validation for Binary Classification sample exports the results of cross validation to files so that the results for multiple models can be compared by using a tool such as Excel.
This section provides an example of how a typical dataset would look when converted to to ARFF.
Typically an ARFF data file is comprised of two sections: a header that defines the data source and schema, and the data section, which contains the actual entities and their attributes.
Header
Defines the list of the attributes (in columns) and their data types.
The header can also contain multiple comment lines that describe the data source or any other notes.
% Source: Iris dataset, UCI % 0 = Iris-setosa, 1= Iris-virginica @RELATION iris @ATTRIBUTE sepal_length NUMERIC @ATTRIBUTE sepal_width NUMERIC @ATTRIBUTE petal_length NUMERIC @ATTRIBUTE petal_width NUMERIC @ATTRIBUTE class {0, 1}
If the dataset you are converting does not have column names, use the Edit Metadata module to add column names before using converting to ARFF. |
Data
The data section consists of comma-separated values, and looks very much like a CSV file without column headings.
@DATA 5.1,3.5,1.4,0.2,0
For additional information about this file format, see the Weka Wiki page: ARFF (developer version).
ARFF version
Azure Machine Learning Studio saves ARFF files by using the ARFF 3.0 format.
| Name | Type | Description |
|---|---|---|
| Dataset | Data Table | Input dataset |
| Name | Type | Description |
|---|---|---|
| Results dataset | Arff | Output dataset |