Convert to SVMLight

 

Updated: March 2, 2017

Converts data input to the format used by the SVM-Light framework

Category: Data Format Conversions

You can use the Convert to SVMLight module to convert your datasets to the format that is used by SVMLight. The SVM-Light framework was developed by researchers at Cornell University. The SVM-Light library implements Vapnik's Support Vector Machine, and it can be used for many machine learning tasks, including classification and regression.

For additional details about the format used by SVMLight, see SVMLight Support Vector Machine.

The conversion to SVMLight format entails converting each case into a row of data that begins with the label, followed by feature-value pairs expressed as colon-separated numbers. The conversion process will not automatically identify the correct columns, so it is important that you prepare the columns in your dataset before attempting conversion. For more information, see Preparing Data for Conversion.

  1. Add the Convert to SVMLight module to your experiment. You can find this module in the Data Format Conversions group in the experiment items list in Azure Machine Learning Studio.

  2. Connect the dataset or output that you want to convert to SVMLight format.

  3. Run the experiment.

  4. Right-click the output of the module, select Download, and save the data to a local file fo modification or for re-use with a program that supports SVMLight.

Preparing Data for Conversion

To understand the conversion process, consider the Blood Donor dataset. This sample dataset has the following format, in tabular form.

RecencyFrequencyMonetaryTimeClass
25012500981
0133250281
114000351
2205000451
1246000770

Note that the [Class] column is the last column in the table. If you convert the dataset to SVMLight without first indicating that the column is a label, it will be handled as a feature and the first column, [Recency], will be used as the label, like this:

2 1:50 2:12500 3:98 4:1
0 1:13 2:3250 3:28 4:1
1 1:16 2:4000 3:35 4:1

To make sure the labels are generated correctly at the beginning of the row for each case, you must add two instances of the Edit Metadata module.

  • In the first, select the label column ([Class]) and for Fields, select Label.

  • In the second, select all feature columns that you need in the converted file ([Recency], [Frequency], [Monetary], [Time]) and for Fields, select Features.

After the columns have been identified correctly, you can run the Convert to SVMLight module. After conversion, the first few rows of the Blood Donor dataset now have this format:

  • The label value precedes each entry, followed by the values for [Recency], [Frequency], [Monetary], and [Time], identified as features 1, 2, 3 and 4 respectively.

  • The label value of 0 in the fifth row has been converted to -1. This is because SVMLight supports only binary classification labels.

1 1:2 2:50 3:12500 4:98
1 1:0 2:13 3:3250 4:28
1 1:1 2:16 3:4000 4:35
1 1:2 2:20 3:5000 4:45
-1 1:1 2:24 3:6000 4:77

You can't directly use this text data for models in Azure ML, or visualize it, but you can download it to a local share. While you have the file open, we recommend that you add a comment line, prefaced by #, so that you can add notes about the source or the original feature column names.

System_CAPS_ICON_tip.jpg Tip

To use an SVMLight file in Vowpal Wabbit, and make additional modifications as described here: Conversion to Vowpal Wabbit Format. When the file is ready, upload it to Azure blob storage, and call it directly from one of the Vowpal Wabbit modules.

There are no examples in the Cortana inteligence Gallery that are specific to this format.

The executables provided in the SVM-Light framework require both an example file and a model file. However, this module creates only the example file. You must create the model file separately by using the SVMLight libraries.

The example file is the file that contains the training examples.

  • Optional header

    The first lines can contain comments. Comments must be prefixed with the number sign (#).

    The file format output by Convert to SVMLight does not create headers. You can edit the file to add comments, a list of column names, and so forth.

  • Training data

    Each case is on its own row. A case consists of a target value followed by a series of indices and the associated feature values.

    The response value must be 1 or -1 for classification, or a number for regression.

    The target value and each of the index-value pairs are separated by a space.

Training Data Example

The following table shows how the values in the columns of the Two-Class Iris dataset are converted to a representation in which each column is represented by an index, followed by a colon, and then the value in that column :

Iris DatasetIris Dataset Converted to SVMLight
1 6.3 2.9 5.6 1.81 1:6.3 2:2.9 3:5.6 4:1.8
0 4.8 3.4 1.6 0.2-1 1:4.8 2:3.4 3:1.6 4:0.2
1 7.2 3.2 6 1.81 1:7.2 2:3.2 3:6 4:1.8

Note that the names of the feature columns are lost in the conversion.

Conversion to Vowpal Wabbit Format

The SVMLight format is very similar to the format used by Vowpal Wabbit. To change the SVMLight output file to a format usable for training a Vowpal Wabbit model, just add a pipe symbol between the label and the list of features.

For example, compare these lines of input:

Vowpal Wabbit format, including optional comment

# features are [Recency], [Frequency], [Monetary], [Time]
1 | 1:2 2:50 3:12500 4:98
1 | 1:0 2:13 3:3250 4:28

SVMLight format, including optional comment

# features are [Recency], [Frequency], [Monetary], [Time]
1 1:2 2:50 3:12500 4:98
1 1:0 2:13 3:3250 4:28

NameTypeDescription
DatasetData TableInput dataset
NameTypeDescription
Results datasetSvmLightOutput dataset

Data Format Conversions
A-Z Module List

Show: