Export to Azure Blob Storage

 

Updated: May 31, 2017

Use this option in the Export Data module to export data from a machine learning experiment to Azure blob storage. This is useful for sharing outputs with other applications, or for storing intermediate data or cleaned datasets for use in other experiments. Azure blobs can be accessed from anywhere, by using either HTTP or HTTPS.

Because Azure blob storage is an unstructured data store, you can export data in various formats: currently, CSV, TSV, and ARFF formats are supported.

If you are exporting data to Azure blob for use by other applications, you use the Export Data module to save the data to Azure blob storage. Then, provide the account information and blob URL to another tool that can read data from Azure storage (such as Excel, cloud storage utilities, or other cloud services), to read and use the data.

System_CAPS_ICON_note.jpg Note

The Import Data and Export Data modules can read and write data only from Azure storage created using the Classic deployment model. In other words, the new Azure Blob Storage account type that offers a hot and cool storage access tiers is not yet supported.

Generally, any Azure storage accounts that you might have created before this service option became available should not be affected.

However, if you need to create a new account for use with Azure Machine Learning, we recommend that you either select Classic for the Deployment model, or use Resource manager and for Account kind, select General purpose rather than Blob storage.

The Azure blob service is for storing large amounts of data, including binary data. There are two types of blob storage: public blobs, and blobs that require login credentials.

  1. Add the Export Data module to your experiment. You can find this module in the Data Input and Output group in the experiment items list in Azure Machine Learning Studio.

    Connect it to the module that produces the data that you want to export to Azure blob storage.

  2. For the data destination, select Azure Blob Storage.

  3. For Authentication type, choose Public (SAS URL) if you know that the storage supports access via a SAS URL. A SAS URL is a special type of URL that can be generated by using an Azure storage utility, and is available for only a limited time. It contains all the information that is needed for authentication and download.

    For URI, type or paste the full URI that defines the account and the public blob.

  4. If you did not use a SAS URL, choose Account, to save your data to a private account. Specify the account name and provide the account key, so that the experiment can write to the storage account.

    • For Account name, type or paste the name of the account where you want to save the data.

      For example, if the full URL of the storage account is http://myshared.blob.core.windows.net, you would type myshared.

    • For Account key, paste the storage access key that is associated with the account.

      If you don’t know the access key, see the section on viewing and regenerating storage access keys in this article: About Azure Storage Accounts.

  5. For Path to container, directory, or blob, type the name of the blob where the exported data will be stored.

    For example, to save the results of your experiment to a new blob named results01.csv in the container predictions in an account named mymldata, the full URL for the blob would be http://mymldata.blob.core.windows.net/predictions/results01.csv

    Therefore, in the field Path to container, directory, or blob, you would specify the container and blob name as follows:

    predictions/results01.csv

  6. If you specify the name of a blob that does not already exist, Azure will create the blob for you.

    If you are writing to an existing blob, you can specify that current contents of the blob be overwritten by setting the property, Azure blob storage write mode.

    By default, the Azure blob storage write mode is set to Error, meaning that you will get an error whenever an existing blob file of the same name is found.

  7. For File format for blob file, select the format in which data should be stored.

    • CSV. Comma-separated values (CSV) is the default storage format. To export column headings together with the data, select the option, Write blob header row.

      For more information about the comma- delimited format used in Azure Machine Learning, see Convert to CSV.

    • TSV. Tab-separated values (TSV) is compatible with many machine learning tools. To export column headings together with the data, select the option, Write blob header row.

      For more information about the tab-separated format used in Azure Machine Learning, see Convert to TSV.

    • ARFF. This format supports saving files in the format used by the Weka toolset. This format is not supported for files stored in a SAS URL.

      For more information about the ARFF format, see Convert to ARFF.

  8. Select the Use cached results option if you want to avoid rewriting the results to the blob file each time you run the experiment.

    When this option is selected, if there are no other changes to module parameters, the experiment will write the results only the first time the module is run, or when there are changes to the data.

For examples of how to use the Export Data module, see these experiments and templates in the Cortana Intelligence Gallery:

  • This tutorial demonstrates how you can use Azure Logic Apps to automate both the import of data used by experiments, and writing experiment results to blob storage.

    No-code batch scoring

  • This article describes a more complex data pipeline that sends data back to an on-premises SQL Server database, using blob storage as an interim stage. Use of an on-premises database requires configuration of a data gateway, but you can skip that part of the example, and just use blob storage.

    Operationalize Azure ML solution with On-premise SQL Server using Azure data factory

This section contains answers to commonly asked questions and information about advanced configuration options.

  • How can I avoid writing the data if the experiment hasn't changed?

    When your experiment results changes, Export Data will always save the new dataset. However, if you are running the experiment repeatedly without making changes that affect the output data, you can select the Use cached results option. The module will check whether the experiment has run previously using the same data and same options, and if a previous run is found, the write operation is not repeated.

  • Can I save data to an account in a different geographical region?

    Yes, you can write data to accounts in different regions. However, if the storage account is in a different region from the compute node used for the machine learning experiment, data access will be slower. Further, there will be charges for data ingress and egress on the subscription.

General options

NameRangeTypeDefaultDescription
Data sourceListData Source Or SinkAzure Blob StorageThe destination can be a file in Azure BLOB storage, an Azure table, a table or view in an Azure SQL Database, or a Hive table.
Use cached resultsTRUE/FALSEBooleanFALSEModule only executes if valid cache does not exist; otherwise use cached data from prior execution.
Please specify authentication typeSAS/AccountAuthenticationTypeAccountIndicates whether SAS or account credentials should be used for access authorization

Public or SAS - Public storage options

NameRangeTypeDefaultDescription
SAS URI for blobanyStringnoneThe SAS URI of the blob to be written to (required)
File format for SAS fileARFF

CSV

TSV
LoaderUtils.FileTypesCSVIndicates whether file is CSV, TSV, or ARFF. (required)
Write SAS header rowTRUE/FALSEBooleanFALSEIndicates whether column headings should be written to the file

Account - Private storage options

NameRangeTypeDefaultDescription
Azure account nameanyStringnoneAzure user account name
Azure account keyanySecureStringnoneAzure storage key
Path to blob beginning with containeranyStringnoneName of the blob file, beginning with the container name
Azure blob storage write modeList: Error, Overwriteenum:BlobFileWriteModeErrorChoose the method of writing blob files
File format for blob fileARFF

CSV

TSV
LoaderUtils.FileTypesCSVIndicates whether blob file is CSV, TSV, or ARFF
Write blob header rowTRUE/FALSEBooleanFALSEIndicates whether blob file should have header row
ExceptionDescription
Error 0027An exception occurs when two objects have to be the same size, but they are not.
Error 0003An exception occurs if one or more of inputs are null or empty.
Error 0029An exception occurs when an invalid URI is passed.
Error 0030an exception occurs in when it is not possible to download a file.
Error 0002An exception occurs if one or more parameters could not be parsed or converted from the specified type to the type required by the target method.
Error 0009An exception occurs if the Azure storage account name or the container name is specified incorrectly.
Error 0048An exception occurs when it is not possible to open a file.
Error 0046An exception occurs when it is not possible to create a directory on specified path.
Error 0049An exception occurs when it is not possible to parse a file.

Import Data
Export Data
Export to Azure SQL Database
Export to Hive Query
Export to Azure Table

Show: