Supported Sources and Sinks

 

Important

This topic and its sub-topics provides the JSON format that is supported by the older versions of Azure PowerShell. If you are using the July 2015 Release version of Azure PowerShell or later, see Supported Sources and Sinks for the latest JSON format. You can convert the JSON from old format to new format by using the tool: JSON Upgrade Tool 

The Copy Activity supports the following combinations of source/sink at this time.

X in the following table means: Supported.

Source/Sink

Azure
Blob

Azure
Table

Azure
SQL Database

Azure
DocumentDB

SQL Server
on Azure VM

On-premises
SQL Server

Azure Blob

X

X

X

X

X

X

Azure Table

X

X

X

X

X

X

Azure SQL Database

X

X

X

X

X

X

Azure DocumentDB

X

X

X

SQL Server on Azure VM

X

X

X

X

X

On-premises SQL Server Database

X

X

X

X

X

On-premises Oracle Database

X

X

X

X

X

On-premises File System

X

X

X

X

X

On-premises MySQL Database

X

X

X

X

X

On-premises DB2 Database

X

X

X

X

X

On-premises Teradata Database

X

X

X

X

X

On-premises Sybase Database

X

X

X

X

X

On-premises PostgreSQL Database

X

X

X

X

X

The following table provides the source type and sink type for various data sources that you can use in the JSON script.

Source type

Sink type

Azure Blob

BlobSource properties

BlobSink properties

Azure Table

AzureTableSource properties

AzureTableSink properties

Azure SQL Database

SqlSource properties

SqlSink properties

Azure DocumentDB

DocumentDbCollectionSource properties

DocumentDbCollectionSink properties

SQL Server on Azure VM

SqlSource properties

SqlSink properties

On-premises SQL Server

SqlSource properties

SqlSink properties

On-premises Oracle

OracleSource properties

FileSystem

FileSystemSource

On-premises MySQL

On-premises DB2

On-premises Teradata

On-premises Sybase

On-premises PostgreSQL

RelationalSource properties

The following sections provide properties supported by these sources and sinks.

Note

The FileSystemSource does not support any properties at this time.

BlobSource properties

Supported property

Description

Allowed values

Required

treatEmptyAsNull

Specifies whether to treat null or empty string as null value.

TRUE

FALSE

N

skipHeaderLineCount

Indicate how many lines need be skipped. It is applicable only when input dataset is using TextFormat.

Integer from 0 to Max.

N

AzureTableSource properties

Supported property

Description

Allowed values

Required

azureTableSourceQuery

Use the custom query to read data.

Azure table query string.Sample: “ColumnA eq ValueA”

N

azureTableSourceIgnoreTableNotFound

Indicate whether swallow the exception of table not exist.

TRUE

FALSE

N

DocumentDbCollectionSource properties

Supported property

Description

Allowed values

Required

query

Specify the query to read data.

Query string supported by DocumentDB.

Example: SELECT c.BusinessEntityID, c.PersonType, c.NameStyle, c.Title, c.Name.First AS FirstName, c.Name.Last AS LastName, c.Suffix, c.EmailPromotion FROM c WHERE c.ModifiedDate > \"2009-01-01T00:00:00\".

N

If not specified, the SQL statement that is executed: select <columns defined in structure> from mycollection.

nestingSeparator

Special character to indicate that the document is nested.

Any character. See the example below.

N

Example

Sample JSON document in the Person collection in a DocumentDB database:

{
  "PersonId": 2,
  "Name": {
    "First": "Jane",
    "Middle": "",
    "Last": "Doe"
  }
}

DocumentDB supports querying documents using a SQL like syntax over hierarchical JSON documents. Example:

SELECT Person.PersonId, Person.Name.First AS FirstName, Person.Name.Middle as MiddleName, Person.Name.Last AS LastName FROM Person

This is the query we will use when defining a pipeline later. For now, let’s create an input table that represents a document in the Person collection. Note that the location type is set to DocumentDbCollectionLocation, collectionName is set to Person, and linkedServiceName is set to a linked service of type DocumentDbLinkedService.

{
    "name": "PersonDocumentDbTable",
    "properties": {
        "location": {
            "type": "DocumentDbCollectionLocation",
            "collectionName": "Person",
            "linkedServiceName": "DocumentDbLinkedService"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1,
            "waitOnExternal": {}
        }
    }
}

Now, let’s create an output table of type: AzureBlob.

{
    "name": "PersonBlobTableOut",
    "properties": {
        "location": {
            "type": "AzureBlobLocation",
            "folderPath": "docdb",
            "format": {
                "type": "TextFormat",
                "columnDelimiter": ",",
                "nullValue": "NULL"
            },
            "linkedServiceName": "StorageLinkedService"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

The following pipeline copies data from the Person collection in the DocumentDB database to an Azure blob in the docdb container.

{
    "name": "DocDbToBlobPipeline",
    "properties": {
        "activities": [
            {
                "type": "CopyActivity",
                "transformation": {
                    "source": {
                        "type": "DocumentDbCollectionSource",
                        "query": "SELECT Person.Id, Person.Name.First AS FirstName, Person.Name.Middle as MiddleName, Person.Name.Last AS LastName FROM Person",
                        "nestingSeparator": "."
                    },
                    "sink": {
                        "type": "BlobSink",
                        "blobWriterAddHeader": true,
                        "writeBatchSize": 1000,
                        "writeBatchTimeout": "00:00:59"
                    }
                },
                "inputs": [
                    {
                        "name": "PersonDocumentDbTable"
                    }
                ],
                "outputs": [
                    {
                        "name": "PersonBlobTableOut"
                    }
                ],
                "policy": {
                    "concurrency": 1
                },
                "name": "CopyFromDocDbToBlob"
            }
        ],
        "start": "2015-04-01T00:00:00Z",
        "end": "2015-04-02T00:00:00Z"
    }
}

SqlSource properties

Supported property

Description

Allowed values

Required

sqlReaderQuery

Use the custom query to read data.

SQL query string.For example: select * from MyTable.

If not specified, the SQL statement that is executed: select <columns defined in structure> from MyTable.

N

OracleSource properties

Supported property

Description

Allowed values

Required

oracleReaderQuery

Use the custom query to read data.

SQL query string. For example: select * from LOG 

If not specified, the SQL statement that is executed: select <columns defined in the structure> from MyTable

N

RelationalSource properties

Supported property

Description

Allowed values

Required

query

Use the custom query to read data.

SQL query string. For example: select * from MyTable.

Y

AzureTableSink properties

Supported property

Description

Allowed values

Required

azureTableDefaultPartitionKeyValue

Default partition key value that can be used by the sink.

A string value.

N

azureTablePartitionKeyName

User specified column name, whose column values are used as partition key.

If not specified, AzureTableDefaultPartitionKeyValue is used as the partition key.

A column name.

N

azureTableRowKeyName

User specified column name, whose column values are used as row key. If not specified, use a GUID for each row.

A column name.

N

azureTableInsertType

The mode to insert data into Azure table.

“merge”

“replace”

N

writeBatchSize

Inserts data into the Azure table when the writeBatchSize or writeBatchTimeout is hit.

Integer from 1 to 100 (unit = Row Count)

N(Default = 100)

writeBatchTimeout

Inserts data into the Azure table when the writeBatchSize or writeBatchTimeout is hit

(Unit = timespan)Sample: “00:20:00” (20 minutes)

N(Default to storage client default timeout value 90 sec)

SqlSink properties

Supported property

Description

Allowed values

Required

sqlWriterStoredProcedureName

User specified stored procedure name to upsert (update/insert) data into the target table.

Name of the stored procedure.

N

sqlWriterTableType

User specified table type name to be used in the above stored procedure.

A table type name.

N

writeBatchTimeout

Wait time for the batch insert operation to complete before it times out.

(Unit = timespan)

Example: “00:30:00” (30 minutes).

N

writeBatchSize

Inserts data into the SQL table when the buffer size reaches writeBatchSize.

Integer. (unit = Row Count)

N(Default = 10000)

Example:

"sink":
{
    "type": "SqlSink",
    "sqlWriterTableType": "MarketingType",
    "sqlWriterStoredProcedureName": "spOverwriteMarketing", 
    "storedProcedureParameters":
    {
        "stringData": 
        {
            "value": "str1"     
        }
    }
}

BlobSink properties

Supported property

Description

Allowed values

Required

blobWriterAddHeader

Specifies whether to add header of column definitions.

TRUE

FALSE (default)

N

DocumentDbCollectionSink properties

Supported property

Description

Allowed values

Required

nestingSeparator

A special character in the source column name to indicate that nested document is needed.

For example: Name.First in the output table produces the following JSON structure in the DocumentDB document:

"Name": {
    "First": "John"
  },

Character that is used to separate nesting levels. Default value is . (dot).

N

writeBatchSize

Number of parallel requests to DocumentDB service to create documents.

You can fine tune the performance when copying data to/from DocumentDB by using this property. You can expect a better performance when you increase writeBatchSize because more parallel requests to DocumentDB are sent. However you’ll need to avoid throttling that can throw the error message: "Request rate is large".

Throttling is decided by a number of factors, including size of documents, number of terms in documents, indexing policy of target collection, etc. For copy operations, you can use a better (e.g. S3) collection to have the most throughput available (2,500 request units/second), or reduce writeBatchSize to have a more robust although slower copy activity.

Integer value.

N

Example

Sample blob input:

1,John,,Doe
2, Jane,,Doe

Input Azure blob table is defined as follows:

{
    "name": "PersonBlobTableIn",
    "properties": {
        "structure": [
            {
                "name": "Id",
                "type": "Int"
            },
            {
                "name": "FirstName",
                "type": "String"
            },
            {
                "name": "MiddleName",
                "type": "String"
            },
            {
                "name": "LastName",
                "type": "String"
            }
        ],
        "location": {
            "type": "AzureBlobLocation",
            "fileName": "input.csv",
            "folderPath": "docdb",
            "format": {
                "type": "TextFormat",
                "columnDelimiter": ",",
                "nullValue": "NULL"
            },
            "linkedServiceName": "StorageLinkedService"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1,
            "waitOnExternal": {}
        }
    }
}

Ouput Azure DocumentDB table is defined as follows:

{
    "name": "PersonDocumentDbTableOut",
    "properties": {
        "structure": [
            {
                "name": "Id",
                "type": "Int"
            },
            {
                "name": "Name.First",
                "type": "String"
            },
            {
                "name": "Name.Middle",
                "type": "String"
            },
            {
                "name": "Name.Last",
                "type": "String"
            }
        ],
        "location": {
            "type": "DocumentDbCollectionLocation",
            "collectionName": "Person",
            "linkedServiceName": "DocumentDbLinkedService"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

Pipeline is defined as follows:

{
    "name": "BlobToDocDbPipeline",
    "properties": {
        "activities": [
            {
                "type": "CopyActivity",
                "transformation": {
                    "source": {
                        "type": "BlobSource"
                    },
                    "sink": {
                        "type": "DocumentDbCollectionSink",
                        "nestingSeparator": ".",
                        "writeBatchSize": 2,
                        "writeBatchTimeout": "00:00:00"
                    }
                    "translator": {
                        "type": "TabularTranslator",
                        "ColumnMappings": "FirstName: Name.First, MiddleName: Name.Middle, LastName: Name.Last, BusinessEntityID: BusinessEntityID, PersonType: PersonType, NameStyle: NameStyle, Title: Title, Suffix: Suffix, EmailPromotion: EmailPromotion, rowguid: rowguid, ModifiedDate: ModifiedDate"
                    }

                },
                "inputs": [
                    {
                        "name": "PersonBlobTableIn"
                    }
                ],
                "outputs": [
                    {
                        "name": "PersonDocumentDbTableOut"
                    }
                ],
                "policy": {
                    "concurrency": 1
                },
                "name": "CopyFromBlobToDocDb"
            }
        ],
        "start": "2015-04-14T00:00:00Z",
        "end": "2015-04-15T00:00:00Z"
    }
}

Ouput JSON in the DocumentDB will look like:

{
  "Id": 1,
  "Name": {
    "First": "John",
    "Middle": null,
    "Last": "Doe"
  },
  "id": "a5e8595c-62ec-4554-a118-3940f4ff70b6"
}

DocumentDB is a NoSQL store for JSON documents, where nested structures are allowed. Azure Data Factory enables user to denote hierarchy via nestingSeparator, which is “.” in this example. With the separator, the copy activity will generate the “Name” object with three children elements First, Middle and Last, according to “Name.First”, “Name.Middle” and “Name.Last” in the table definition.