Enter Data Manually
Updated: February 23, 2017
Enables entering and editing small datasets by typing values
Category: Data Transformation / Manipulation
You can use the Enter Data Manually module to create a small, single-column dataset by typing values, rather than loading data from a source in Azure Machine Learning Studio or from a local file.
This can be helpful in scenarios such as:
Generating a short list of values for testing
Creating a shorter list of labels for use with the Group Categorical Values or Execute R Script modules
Entering values for use in Apply Math Operation or replacement values for use in Replace Discrete Values
Typing a list of column names to insert in a dataset
Add the Enter Data Manually module to your experiment.
For DataFormat, select one of the following options:
ARFF. The attribute-relation file format, used by Weka. For more information, see Convert to ARFF.
CSV. Comma-separated values format. For more information, see Convert to CSV.
SVMLight. A format used by Vowpal Wabbit and other machine learning frameworks. For more information, see Convert to SVMLight.
TSV. Tab-separated values format. For more information, see Convert to TSV.
These options determine how data that you provide will be parsed. For example, if you choose the ARFF format, you must provide your data in a format that meets the ARFF specifications, or a run-time error will occur.
The requirements for each format differ greatly, so be sure to read the related topics.
Click inside the Data text box to start entering data.
The following formats require special considerations:
CSV
To create multiple columns, paste in comma-separated text, or type multiple columns using commas between fields.
If you select the HasHeader option, you can use the first row of values as the column heading.
If you deselect this option, the columns names, Col1, Col2 and so forth are used. You can add or change columns names later using Edit Metadata.
TSV
To create multiple columns, paste in tab-separated text, or type multiple columns using tabs between fields.
If you select the HasHeader option, you can use the first row of values as the column heading.
If you deselect this option, the columns names, Col1, Col2 and so forth are used. You can add or change columns names later using Edit Metadata.
ARFF
Paste in an existing ARFF format file.
If you are typing values directly, be sure to add the optional header and required attribute fields at the beginning of the data. For example, the following header and attribute rows could be added to a simple list. The column heading would be SampleText.
% Title: SampleTextARFF
% Source: Enter Data module
@ATTRIBUTE SampleText STRING
@DATA
<type first data row here>
SVMLight
Type of paste in values using the SVMLight format.
For example, the following sample represents the first couple lines of the Blood Donation dataset, in SVMight format:
# features are [Recency], [Frequency], [Monetary], [Time] 1 1:2 2:50 3:12500 4:98 1 1:0 2:13 3:3250 4:28
If you run the Enter Data Manually module, these lines are converted to a dataset of columns and index values as follows:
Col1 Col2 Col3 Col4 Labels 0.00016 0.004 0.999961 0.00784 1 0 0.004 0.999955 0.008615 1
Press ENTER after each row, to start a new line.
Be sure to press ENTER after the final row. If you press ENTER multiple times to add multiple empty trailing rows, the final empty row will be trimmed, but other empty rows will be treated as missing values.
You can enter rows with missing values, and then go back and edit the rows to add values later.
Right-click the module and select Run selected to parse the data and load it into your workspace as a dataset.
You can view the dataset by clicking the output port and selecting Visualize.
For examples of how this module is used in machine learning, see this sample experiment in the Model Gallery:
The Download Data sample gets data from the UCI Machine Learning repository and then uses Enter Data Manually to create column names.
Sample R code is also provided, which you can use to merge the entered rows with the dataset.
Regardless of the saved format, data that you enter is implicitly converted to the dataset (Data Table) format for use in experiments. However, data is not persisted as a saved dataset unless you explicitly choose the Save as Dataset option.
If you do not save the data in Enter Data Manually as a dataset, it will be removed from the workspace cache when you end the session. However, you can run the experiment again to make the data available.
If you combine the data from Enter Data Manually with another dataset, the combined dataset cannot have two columns with the same name. If there are duplicate column names, a numeric suffix is appended to the column from the right dataset to make the column names unique.
For example, assume that you have two instances of Enter Data Manually that contain the column TestData, and use the Add Columns module to merge them. The column from the left instance of Enter Data Manually would remain as TestData, and the column from the right instance of Enter Data Manually would be renamed TestData (2).