November 2010

Volume 25 Number 11

Forecast: Cloudy - SQL Azure and Microsoft Azure Table Storage

By Joseph Fultz | November 2010

Joseph FultzA common scenario that plays out in my family is that we decide to take it easy for the evening and enjoy a night out eating together. Everyone likes this and enjoys the time to eat and relax. We have plenty of restaurant choices in our area so, as it turns out, unless someone has a particularly strong preference, we get stuck in the limbo land of deciding where to eat.

It’s this same problem of choosing between seemingly equally good choices that I find many of my customers and colleagues experience when deciding what storage mechanism to use in the cloud. Often, the point of confusion is understanding the differences between Azure Table Storage and SQL Azure.

I can’t tell anyone which technology choice to make, but I will provide some guidelines for making the decision when evaluating the needs of the solution and the solution team against the features and the constraints for both Azure Table Storage and SQL Azure. Additionally, I’ll add in a sprinkle of code so you can get the developer feel for working with each.


Data Processing

SQL Azure and other relational databases usually provide data-processing capabilities on top of a storage system. Generally, RDBMS users are more interested in data processing than the raw storage and retrieval aspects of a database.

For example, if you want to find out the total revenue for the company in a given period, you might have to scan hundreds of megabytes of sales data and calculate a SUM. In a database, you can send a single query (a few bytes) to the database that will cause the database to retrieve the data (possibly many gigabytes) from disk into memory, filter the data based on the appropriate time range (down to several hundred megabytes in common scenarios), calculate the sum of the sales figures and return the number to the client application (a few bytes).

To do this with a pure storage system requires the machine running the application code to retrieve all of the raw data over the network from the storage system, and then the developer has to write the code to execute a SUM on the data. Moving a lot of data from storage to the app for data processing tends to be very expensive and slow.

SQL Azure provides data-processing capabilities through queries, transactions and stored procedures that are executed on the server side, and only the results are returned to the app. If you have an application that requires data processing over large data sets, then SQL Azure is a good choice. If you have an app that stores and retrieves (scans/filters) large datasets but does not require data processing, then Azure Table Storage is a superior choice.

—Tony Petrossian, Principal Program Manager, Microsoft Azure


Reviewing the Options

Expanding the scope briefly to include the other storage mechanisms in order to convey a bit of the big picture, at a high level it’s easy to separate storage options into these big buckets:

  • Relational data access: SQL Azure
  • File and object access: Azure Storage
  • Disk-based local cache: role local storage

However, to further qualify the choices, you can start asking some simple questions such as:

  • How do I make files available to all roles commonly?
  • How can I make files available and easily update them?
  • How can I provide structured access semantics, but also provide sufficient storage and performance?
  • Which provides the best performance or best scalability?
  • What are the training requirements?
  • What is the management story?

The path to a clear decision starts to muddy and it’s easy to get lost in a feature benefit-versus-constraint comparison. Focusing back on SQL Azure and Azure Table Storage, I’m going to describe some ideal usage patterns and give some code examples using each.

SQL Azure Basics

SQL Azure provides the base functionality of a relational database for use by applications. If an application has data that needs to be hosted in a relational database management system (RDBMS), then this is the way to go. It provides all of the common semantics for data access via SQL statements. In addition, SQL Server Management Studio (SSMS) can hook directly up to SQL Azure, which provides for an easy-to-use and well-known means of working with the database outside of the code.

For example, setting up a new database happens in a few steps that need both the SQL Azure Web Management Console and SSMS. Those steps are:

  1. Create database via Web
  2. Create rule in order to access database from local computer
  3. Connect to Web database via local SSMS
  4. Run DDL within context of database container

If an application currently uses SQL Server or a similar RDBMS back end, then SQL Azure will be the easiest path in moving your data to the cloud.

SQL Azure is also the best choice for providing cloud-based access to structured data. This is true whether the app is hosted in Azure or not. If you have a mobile app or even a desktop app, SQL Azure is the way to put the data in the cloud and access it from those applications.

Using a database in the cloud is not much different from using one that’s hosted on-premises—with the one notable exception that authentication needs to be handled via SQL Server Authentication. You might want to take a look at Microsoft Project Code-Named “Houston,” which is a new management console being developed for SQL Azure, built with Silverlight. Details on this project are available at sqlazurelabs.cloudapp.net/houston.aspx.

SQL Azure Development

Writing a quick sample application that’s just a Windows Form hosting a datagrid that displays data from the Pubs database is no more complicated than it was when the database was local. I fire up the wizard in Visual Studio to add a new data source and it walks me through creating a connection string and a dataset. In this case, I end up with a connection string in my app.config that looks like this:

<add name="AzureStrucutredStorageAccessExample.Properties.Settings.pubsConnectionString"
     connectionString="Data Source=gfkdgapzs5.database.windows.net;Initial Catalog=pubs;Persist Security Info=True;User ID=jofultz;Password=[password]"
     providerName="System.Data.SqlClient" />

Usually Integrated Authentication is the choice for database security, so it is feels a little awkward using SQL Server Authentication again. SQL Azure minimizes exposure by enforcing an IP access list to which you will need to add an entry for each range of IPs that might be connecting to the database.

Going back to my purposefully trivial example, by choosing the Titleview View out of the Pubs database, I also get some generated code in the default-named dataset pubsDataSet, as shown in Figure 1.

image: Automatically Generated Code for Accessing SQL Azure

Figure 1 Automatically Generated Code for Accessing SQL Azure

I do some drag-and-drop operations by dragging a DataGridView onto the form and configure the connection to wire it up. Once it’s wired up, I run it and end up with a quick grid view of the data, as shown in Figure 2.

image: SQL Azure Data in a Simple Grid

Figure 2 SQL Azure Data in a Simple Grid

I’m not attempting to sell the idea that you can create an enterprise application via wizards in Visual Studio, but rather that the data access is more or less equivalent to SQL Server and behaves and feels as expected. This means that you can generate an entity model against it and use LINQ just as I might do so if it were local instead of hosted (see Figure 3).

image: Using an Entity Model and LINQ

Figure 3 Using an Entity Model and LINQ

A great new feature addition beyond the scope of the normally available SQL Server-based local database is the option (currently available via sqlazurelabs.com) to expose the data as an OData feed. You get REST queries such as this:

https://odata.sqlazurelabs.com/  
  OData.svc/v0.1/gfkdgapzs5/pubs/  
  authors?$top=10

This results in either an OData response or, using the $format=JSON parameter, a JSON response. This is a huge upside for application developers, because not only do you get the standard SQL Server behavior, but you also get additional access methods via configuration instead of writing a single line of code. This allows for the focus to be placed on the service or application layers that add business value versus the plumbing to get data in and out of the store across the wire.

If an application needs traditional relational data access, SQL Azure is most likely the better and easier choice. But there are a number of other reasons to consider SQL Azure as the primary choice over Azure Table Storage.

The first reason is if you have a high transaction rate, meaning there are frequent queries (all operations) against the data store. There are no per-transaction charges for SQL Azure.

SQL Azure also gives you the option of setting up SQL Azure Data Sync (sqlazurelabs.com/SADataSync.aspx) between various Azure databases, along with the ability to synchronize data between local databases and SQL Azure installations (microsoft.com/windowsazure/developers/sqlazure/datasync/). I’ll cover design and use of SQL Azure and DataSync with local storage in a future column, when I cover branch node architecture using SQL Azure.

Azure Table Storage

Now you’ve seen the advantages of using SQL Azure for your storage. So when is it more beneficial to rely on Azure Table Storage? Here are a number of scenarios where SQL Azure might not be the right choice.

If an application is being overhauled for moving to the Web or the data storage layer implementation isn’t completed, you probably want to take a look at Azure Table Storage. Likewise, Azure Table Storage makes sense if you don’t need a relational store or access is limited to a single table at a time and doesn’t require joins. In this case, your data sets would be small and joins could be handled client-side by LINQ.

You’ll also want to take a look at Azure Table Storage if you have more data than the maximum amount supported by SQL Azure (which is currently 50GB for a single instance). Note that size limitation can be overcome with some data partitioning, but that could drive up the SQL Azure costs. The same space in Azure Table Storage would probably be less expensive and has partitioning built-in by a declared partition key.

In addition, due to the per-transaction charges for Azure Table Storage, data with a lower-access frequency or data that can be easily cached would be a good choice.

Some other things that make Azure Table Storage appealing include if the application needs some structured style access such as indexed lookup, but stores primarily objects or Binary Large Objects (BLOBs)/Character Large Objects (CLOBs); if your app would benefit from supporting type variability for the data going into the table; or if the existing data structure (or lack therof) in your SQL Server installation makes it difficult to migrate.

Using Azure Table Storage

At first, working with Azure Table Storage may seem a little unwieldy due to assumptions made by relating “table storage” to a SQL database. The use of “table” in the name doesn’t help. When thinking about Azure Table Storage, I suggest that you think of it as object storage.

As a developer, don’t focus on the storage structure or mechanism; instead, focus on the object and what you intend to do with it. Getting the objects set up in Azure Table Storage is often the biggest hurdle for the developer, but accessing Azure Table Storage via objects is natural, particularly if you employ LINQ.

To work with Azure Table Storage, start by adding a reference to System.Data.Services.Client to your project. In addition, add a reference to Microsoft.WindowsAzure.StorageClient.dll if you aren’t working in a Visual Studio Cloud template (which provides this reference for you).

Next, create an object/entity with which you can work (stealing from the Authors table):

public class TableStorageAuthor:
  Microsoft.WindowsAzure.StorageClient.TableServiceEntity {
  public int Id {get; set;}
  public string LastName { get; set; }
  public string FirstName { get; set; }
  public string Phone { get; set; }
  public string Address { get; set; }
  public string City { get; set; }
  public string State { get; set; }
  public string Zip {get; set;}
}

You can define a data service client context using TableServiceContext to handle connecting to the store and do the Create/Read/Update/Delete (CRUD) operations as shown in Figure 4. The TableStorageAuthor class is used as the template class to declare the AuthorData element for which a table query method for the Authors table is returned. It’s also used as a parameter type to the implemented Add operation.

Figure 4 Accessing Azure Table Storage

public class AuthorDataServiceContext : TableServiceContext {
  public IQueryable<TableStorageAuthor> AuthorData {
    get {
      return this.CreateQuery<TableStorageAuthor>("Authors");
    }
  }
  public AuthorDataServiceContext (
    Uri baseAddress, StorageCredentials credentials)
    : base(baseAddress.AbsoluteUri, credentials) {}
  public void Add(TableStorageAuthor author) {
    this.AddObject("Authors", author);
    DataServiceResponse dsResponse = SaveChanges();
  }
}

Create the target table:

TableClient.CreateTableIfNotExist("Authors");

Using a familiar object creation and property assignment paradigm, create some data and add it to the table that was created in storage (see Figure 5).

Figure 5 Adding Data to Azure Table Storage

var TableClient = StorageAccount.CreateCloudTableClient();
TableStorageAuthor author = new TableStorageAuthor();
author.FirstName = "Joseph";
author.LastName = "Fultz";
author.RowKey = System.Guid.NewGuid().ToString();
author.Id = author.RowKey;
author.State = "TX";
author.PartitionKey = "TX";
AuthorDataServiceContext ctx = 
  new AuthorDataServiceContext(
  StorageAccount.TableEndpoint, 
  StorageAccount.Credentials);
ctx.Add(author);

Once all of the data is there it can be manipulated with LINQ. For example, a select for the entities would be:

AuthorDataServiceContext ctx = 
  new AuthorDataServiceContext(
  StorageAccount.TableEndpoint, 
  StorageAccount.Credentials);
var authors = 
  from a in ctx.AuthorData
  select a;
foreach (TableStorageAuthor ta in authors) {
  Debug.WriteLine(ta.FirstName + " " + ta.LastName);
}

I didn’t implement update and delete, but they would be similar. The only thing that might be a little different from those that have used LINQ with the Entity Framework is the code to create the TableServiceContext and the subsequent code to construct and use it. If you’ve been working with REST and the DataServiceContext, doing this work will be quite natural.

By using the TableServiceContext, TableServiceEntity and LINQ, you get about the same feel as using the Entity Framework and LINQ with SQL Azure—albeit with a little more hand-coding on the Azure Table Storage side.

Solution-Based Evaluation

As mentioned before, if the application has a relational store already established, it’s likely best to migrate that to SQL Azure with a little help from a tool like the SQL Azure Migration Wizard. However, if that’s not the case, or the cloud piece of the application doesn’t need the full functionality of an RDBMS, then take a look at the matrix in Figure 6 and see which columns meet most of the needs of the solution requirements and architecture.

Figure 6 Comparing SQL Azure and Azure Table Storage

Feature SQL Azure Common Benefit(s) Azure Table Storage
Select semantics Cross-table queries Primary key-based lookup Single key lookup (by partition)
Performance and scale High performance via multiple indices, normalized data structures and so on, and scalable via manual partitioning across SQL Azure instances   Automatic mass scale by partition and consistent performance even at large scale
User experience Well-known management tools and traditional database design Familiar high-level developer experience Direct serialization; no ORM necessary; simplified design model by removing relational model
Storage style Traditional relational design model Data storage for all types of data Type variability in a single table
Cost factors No transaction cost, pay by database size Network traffic cost outside of same datacenter No space overhead cost, pay for what is used
Data loading and sync Synchronizing between local and cloud-based stores; data easily moved in and out by traditional extract, transform and load (ETL) mechanisms; synchronizing between SQL Azure databases in different datacenters    

It’s important to note that for some of the items in Figure 6 (for example, those related to management and data loading for Azure Table Storage) there are already third-party solutions entering the market to provide the missing functionality. As such, the cost and functionality of such tools will need to be considered for significant projects.

I expect that many applications will need a hybrid data approach to make the best use of technology. For example, Azure Table Storage would be used to optimize fetch times while still providing mass scale for resources such as documents, videos, images and other such media. However, to facilitate searching metadata for the item, related data and object pointers would be stored in SQL Azure. Such a design would also reduce the transaction traffic against Azure Table Storage. This type of complementary design would provide the following benefits:

  • Keep the throughput high for queries to find resources
  • Keep the SQL Azure database size down, so cost for it remains a minimum
  • Minimize the cost of storage by storing the large files in Azure Table Storage versus SQL Azure (though BLOB storage is preferred for files)
  • Maintain a fast retrieval performance by having such resources fetch by key and partition, and offloading the retrieval query from the SQL Azure database
  • Allow for automatic and mass scale for the data kept in Azure Table Storage

Simply put: Your design should allow each storage mechanism to provide the part of the solution that it’s best at performing, rather than trying to have one do the job of both. Either way, you’re still looking to the cloud for an answer.


Joseph Fultz is an architect at the Microsoft Technology Center in Dallas, where he works with both enterprise customers and ISVs designing and prototyping software solutions to meet business and market demands. He has spoken at events such as Tech·Ed and similar internal training events.

Thanks to the following technical experts for reviewing this article: Jai Haridas, Tony Petrossian and Suraj Puri