Export Data

 

Updated: April 25, 2017

Writes a dataset to various forms of cloud-based storage in Azure, such as tables, blobs, and Azure SQL databases

Category: Data Input and Output

You can use the Export Data module to save results, intermediate data, and working data from your experiments into cloud storage destinations outside Azure Machine Learning Studio.

Export Data supports saving your data to the following cloud data services:

  • Hive Query. Write data to a Hive table in an HDInsight Hadoop cluster.

  • Azure SQL Database. Save data to Azure SQL Database or to Azure SQL Data Warehouse.

  • Azure Table. Save data to the table storage service in Azure. Table storage is good for storing large amounts of data. It provides a tabular format that is scalable, inexpensive, and highly available.

  • Azure Blob Storage. Saves data to the Blob service in Azure. This option is useful for images, unstructured text, or binary data. Data in the Blob service can be shared publicly or saved in secured application data stores.

Related Tasks

  • Download data: If you need to download your data so that you can open it in Excel or another application, you can use a module such as Convert to CSV or Convert to TSV to prepare the data in a particular format, and then download the data.

    You can download the results of any module that outputs a dataset by right-clicking the output and selecting Download dataset. By default, the data is exported in CSV format.

  • Download a module definition or experiment graph: A new PowerShell ibrary is available that lets you download the complete metadata for your experiment, or the details for a particular module. The PowerShell for Azure Machine Learning library is in beta, but has these and many other useful cmdlets:

    • Get-AmlExperiment lists all the experiments in a workspace.
    • Export-AmlExperimentGraph exports a definition of the complete experiment to a JSON file.
    • Download-AmlExperimentNodeOutput lets you extract the information provided on the output ports of any module.

    For more information, see PowerShell Module for Azure Machine Learning Studio.

  1. For Data destination, select the type of cloud storage where you'll save your data.

    What you select here will change many of the following options, and if you change this option later, any entries you might have made will be reset. So be sure to choose this option first.

  2. Configure any options required to access the specified storage account. Depending on the storage type and whether the account is secured, you might need to provide the account name, file type, access key, or container name. For sources that do not require authentication, generally it is sufficient to know the URL.

    The following topics provide examples, and configuration details for each storage type:

  3. Select the option, Use cached results, if you want to avoid rewriting the results each time you run the experiment.

    • If you deselect this option, the results will be written to storage each time the experiment is run, regardless of whether the output data has changed.

    • If you select this option, Export Data will generate new results only when there is an upstream change that would affect the results.

  4. Run the experiment.

Not sure how or where you should store your data? See this guide to common data scenarios in the data science process: Scenarios for advanced analytics in Azure Machine Learning

For examples of how to use the Export Data module, see these experiments in the Cortana Intelligence Gallery:

  • This module was previously named Writer. If you previously used the Writer module in an experiment, it will be renamed to Export Data when you refresh the experiment.

  • Not all modules produce output that is compatible with Export Data destinations. For example, Export Data cannot save a dataset that has been converted to the SVMLight format. Export Data supports these formats:

    • dataset (Azure ML internal format)
    • .NET DataTable
    • CSV with or without headers
    • TSV with or without headers
  • When you select Azure Table as the location to output your data, occasionally there might be an error when writing to the specified table, and instead the data is written to a blob. If this error happens and later you are unable to read from a table, try using an Azure storage utility to check the blobs in the specified container in your storage account.

  • The ability to save a blob into a specified Hive table is not working. If you need to write intermediate results, avoid using a Hive table in HDInsight, and use blob storage or table storage instead.

  • Currently, if you select HDFS as the location to save output data, this error message is returned: “Microsoft.Analytics.Exceptions.ErrorMapping+ModuleException.”

NameTypeDescription
DatasetData TableThe dataset to be written.

This table lists parameters that apply to all Export Data options. Other parameters are dynamic and change depending on the data destination you select.

NameRangeTypeDefaultDescription
Please specify data destinationListDataSourceOrSinkBlob service in Azure StorageIndicate whether the data destination is a file in the Blob service, a file in the Table service, a SQL database in Azure, or a Hive table.
Use cached resultsTRUE/FALSEBooleanFALSESelect this option to avoid rewriting results unnecessarily. If anything changes upstream in the experiment, Export Data will always execute and write new results. However if nothing has changed, and you have selected this option, Export Data will not execute in order to avoid rewriting the same results.
ExceptionDescription
Error 0057An exception occurs when attempting to create a file or blob that already exists.
Error 0001An exception occurs if one or more specified columns of the dataset couldn't be found.
Error 0027An exception occurs when two objects have to be of the same size, but they are not.
Error 0079An exception occurs if the container name in Azure Storage is specified incorrectly.
Error 0052An exception occurs if the storage access key for the Azure account is specified incorrectly.
Error 0064An exception occurs if account name or storage access key for the Azure account is specified incorrectly.
Error 0071An exception occurs if the provided credentials are incorrect.
Error 0018An exception occurs if the input dataset is not valid.
Error 0029An exception occurs when an invalid URI is passed.
Error 0003An exception occurs if one or more inputs are null or empty.

Import Data
Data Input and Output
Data Transformation
Comparing Azure Table Storage and Azure SQL Database
A-Z Module List

Show: