Migrating Data to Azure Blob Storage
Updated: May 6, 2014
The Azure Blob service enables applications to store large amounts of unstructured text or binary data, such as video, audio, and image files. Blob storage contains zero or more blob containers and a container contains zero or more blobs. A blob is any single entity comprised of binary data, such as a file or an image.
The storage service offers two types of blobs: Block Blob and Page Blob.
Block Blob is comprised of blocks, each of which is identified by a block ID. You create or modify a block blob by writing a set (or list) of blocks and committing them by their block IDs. Each block can be of a different size, up to a maximum of 4 MB. The maximum size for a block blob is 200 GB, and a block blob can include no more than 50,000 blocks. Block Blobs allow you to insert, delete, and reorder blocks with in a blob and to simultaneously upload multiple blocks of a blob. It is designed to enable uploading and downloading of large blobs efficiently. Consider using block blob if the application stores large files that multiple readers access concurrently.
Page blobs are a collection of 512-byte pages optimized for random read and write operations. Each page in a page blob is referenced by using an offset from the beginning of the blob. To add or update the contents of a page blob, you write a page or pages by specifying an offset and a range that align to 512-byte page boundaries. A write to a page blob can overwrite just one page, some pages, or up to 4 MB of the page blob. A write to a page blob occurs in-place and is immediately committed to the blob. The maximum size of the blob is 1 TB and the blob size must be a multiple of 512 bytes.
For a detailed overview of Blob storage, see Azure Portal.
Authors: Sreedhar Pelluru
Contributors: James Podgorskiani
Reviewers: Christian Martinez, Valery Mizonov, Kun Cheng, Steve Howard
Block Blob vs. Page Blob
Block blobs let you upload large blobs, up to 200 GB, efficiently. They are optimized with features that help you manage large files over networks. One such feature is being able to upload and download multiple blocks in parallel and determine the sequence at time of committal. Page blobs, on the other hand, are optimized for random read and write access, where pages are aligned on a 512-byte boundary.
The following are some of the scenarios where Page Blobs are used:
An application that accesses files with range-based updates. The application treats a page blob as a file and uses ranged writes to update parts of the blob that have changed. Exclusive write access can be obtained for page blob updates.
Custom logging for applications that treat a page blob as a circular buffer. When the page blob is filled, the application can start writing data from the beginning of the blob structure.
Consider various factors such as the following ones when migrating your applications to use Azure Blob storage.
What type of data can be stored in Blob storage?
How can the data stored in Blob storage be accessed from the migrated application?
Does the storage support high availability, scalability, disaster recovery, and security requirements of the migrated application?
How can the existing data be uploaded to Blob storage?
Before redesigning your application to use Blob storage, first evaluate whether Blob storage is a good fit for the data you are trying to store. Blob storage is designed for storing large amounts of unstructured text or binary data such as documents, pictures, audio, and video.
Blob storage can also be used to store files/binaries that your application depends on. By storing dependent files in a blob, you can update dependent files without updating or uploading the entire application package (.Cspkg) file. It also allows you to have different versions of dependent files in separate blobs and application to load dependent files specific to a certain version dynamically.
Blob storage vs. Azure SQL Database
Azure SQL Database supports varbinary(max) data type to support storing large objects in the database. If your application stores and accesses binary large objects such as pictures, audio, and video in a SQL Server database, determine whether to use SQL Database or Blob storage when you migrate your application to the Azure Platform.
If you are using the FILESTREAM attribute on a varbinary column to store files of size is greater than 2 GB in a SQL Server database, consider using Blob storage when you migrate to the Azure Platform because SQL Database does not support FILESTREAM at this time. Even when the file size is less than 2 GB and you do not use the FILESTREAM feature of SQL Server, consider using Blob storage because, depending on the nature of your application, it may be cheaper and more scalable, and because it can be accessed by any client using the REST API.
After you store large objects in Blob storage, you can store a reference to the blob in a column in a table in your SQL Database instance. The maximum size of SQL Database instance is currently 150 GB. Therefore, if you store large objects in a SQL Database instance, you might run out of space. The maximum size of blob storage is 200 TB, which is actually the size limit for the Azure Storage. The maximum size of each blob in blob storage is 200 GB (Block Blob) or 1 TB (Page Blob).
For example, if you are migrating an on-premises web application which has graphic resources such as images, you can store the URL to the image in SQL Database (or Table Storage) and have the client program retrieve the URL and display the image from the URL.
The performance of your application might be affected by moving blobs out of SQL Database and storing only a reference to the blob in Blob storage because the client application queries the SQL Database instance first to determine the location of the blob and then query Blob storage to get the blob data such as images or large objects. Consider that it is not possible to backup/restore data from both SQL Database and Blob storage together, so backups of Blob storage and SQL Database are not guaranteed to be transactionally consistent.
Another thing to consider is the number of transactions that the application performs against the data store. SQL Database has no separate charge for transactions performed against it whereas transactions performed against Azure Storage are charged. Data that is accessed less frequently may be a good candidate for Azure Storage, whereas data that is accessed more frequently may be more economically stored in SQL Database.
Data Access Considerations
Client applications written in any programming language and running on any operating system can access Azure Blob storage using HTTP(S) REST API. Blob storage can also be accessed by using the client libraries that target specific operating systems and programming languages. Libraries exist for .NET, Node.js, Java, and PHP, and are available on Azure Developer Center. For example, the .NET Storage Client Library provides strongly typed .NET wrappers around the REST API to make the development easier for .NET developers.
If you decide to store unstructured data used by your application in Blob storage on the Azure Platform, you will be expected to rewrite the part of the code that accesses the data by using the Storage Client Library.
Benefits of Blob storage
When you store your data in Azure Blob storage, you automatically get several important benefits such as the following ones:
Scalability. Azure Blob storage supports a massively scalable blob distribution system via the Azure CDN. You can also find more information in the “Azure Storage Scalability and Performance Targets” article. The CDN serves hot blobs from many servers to scale out and meet the traffic needs of your application. Furthermore, the system is highly available and durable.
High Availability/Fault tolerance: Blobs stored on Azure are replicated to three locations in the same data center for resiliency against hardware failures. Additionally, your data is replicated across different fault domains to increase availability as with all Azure storage services.
Disaster recovery: Azure allows you to choose between geo-redundant and locally redundant replication in case of a major disaster. To learn more, read the “How to: Manage storage account replication” article.
Security: Every request you make to the Azure Storage services must be authenticated unless it is an anonymous request against a public container resource. See Authenticating Access to Your Storage Account for more details.
Data access from any client, anywhere: Azure Blob storage can be accessed using the REST API via HTTP. Any client application on any operating system can access the Blob storage using REST.
Migrating Existing Data to Azure Blob storage
After you redesign your application to take advantage of the massively scalable Blob storage, you might need to migrate existing data from a File System or a SQL Server database. To do so, you can either write code by using the HTTP(S) REST API or .NET Client Library for the Blob storage or use tools such as Cloud Storage Studio from Red Gate Software.