7 – Moving to Microsoft Azure Table Storage

Article
05/06/2014

On this page:	Download:
The Premise \| Goals and Requirements \| Overview of the Solution \| Why Use Azure Table Storage? \| Profile Data \| The Data Export Process - Initiating the Export Process, Generating the Export Data, Exporting the Report Data \| Inside the Implementation \| Storing Business Expense Data in Azure Table Storage - How Many Tables?, Partition Keys and Row Keys, Defining the Schemas, Retrieving Records from a Multi-Entity Schema Table, Materializing Entities, Query Performance, Working with Development Storage \| Storing Profile Data \| Generating and Exporting the Expense Data - Generating the Expense Report Table, Exporting the Expenses Data \| Performance Testing, Tuning, To-Do Items \| Initializing the Storage Tables, Blobs, and Queues \| Implementing Paging with Azure Table Storage \| Preventing Users from Uploading Large Images \| Validating User Input \| System.Net Configuration Changes \| WCF Data Service Optimizations \| More Information

On this page:

Download:

The Premise | Goals and Requirements | Overview of the Solution | Why Use Azure Table Storage? | Profile Data | The Data Export Process - Initiating the Export Process, Generating the Export Data, Exporting the Report Data | Inside the Implementation | Storing Business Expense Data in Azure Table Storage - How Many Tables?, Partition Keys and Row Keys, Defining the Schemas, Retrieving Records from a Multi-Entity Schema Table, Materializing Entities, Query Performance, Working with Development Storage | Storing Profile Data | Generating and Exporting the Expense Data - Generating the Expense Report Table, Exporting the Expenses Data | Performance Testing, Tuning, To-Do Items | Initializing the Storage Tables, Blobs, and Queues | Implementing Paging with Azure Table Storage | Preventing Users from Uploading Large Images | Validating User Input | System.Net Configuration Changes | WCF Data Service Optimizations | More Information

This chapter describes Adatum’s final step in its migration process to the cloud for the aExpense application. It discusses the advantages of using Microsoft Azure storage instead of a relational database for expense items, the design of a suitable schema for storage, and how the developers at Adatum adapted the data access functions of the application to use Azure storage instead of a relational database. The chapter also walks through the data export feature that Adatum added to the aExpense application, and some of the changes the developers at Adatum made following performance testing.

The Premise

Adatum has now completed the migration of the aExpense application to the cloud, and added functionality that was missing during the initial migration so that users can upload and view scanned receipt images. However, as Adatum discovered when revisiting the costs of running the application in the cloud, there is one more opportunity to minimize these costs by switching to use Azure storage for expense items instead of a relational database.

Adatum also wants to add a final piece of functionality in the application. The aExpense application must generate a file of data that summarizes the approved business expense submissions for a period. Adatum's on-premises payments system imports this data file and then makes the payments to Adatum employees.

In addition to implementing these changes to the aExpense application, Adatum also needs to perform final performance testing and tuning to ensure that the application provides an optimum user experience whilst minimizing its resource usage.

Goals and Requirements

In this phase, Adatum has several specific goals. A simple cost analysis of the existing solution has revealed that Azure SQL Database would account for about one quarter of the annual running costs of the application (see Chapter 6, “Evaluating Cloud Hosting Costs,” for details of the cost calculations). Because the cost of using Azure storage is less than using Azure SQL Database, Adatum is keen to investigate whether it can use Azure storage instead.

Adatum must evaluate whether the aExpense application can use Azure storage. Data integrity is critical, so Adatum wants to use transactions when a user submits multiple business expense items as a part of an expense submission.

You should evaluate whether Azure storage can replace relational database storage in your application.

Also in this phase of the aExpense migration the project the team at Adatum will create the data export feature for integration with its on-premises systems. The on-premises version of aExpense uses a scheduled SQL Server Integration Services job to generate the output file and sets the status of an expense submission to “processing” after it is exported. The on-premises application also imports data from the payments processing system to update the status of the expense submissions after the payment processing system makes a payment. This import process is not included in the current phase of the migration project.

Figure 1 summarizes the export process in the original on-premises application.

Figure 1

The aExpense export process

The design of the export process for the cloud version of aExpense must meet a number of goals. First, the cost of the export process should be kept to a minimum while making sure that it does not have a negative impact on the performance of the application for users. The export process must also be robust and be able to recover from a failure without compromising the integrity of aExpense's data or the accuracy of the exported data.

Markus Says:
	`Approved business expense submissions could be anywhere in the table. We want to try to avoid the performance impact that would result from scanning the entire table.</td>`

The solution must also address the question of how to initiate the export by evaluating whether it should be a manually initiated operation or run on a specific schedule. If it is the latter, the team at Adatum must design a mechanism for initiating the task, such as using a Timer instance to execute it at regular intervals or by using a third party scheduler such as Quartz.

The final requirement is to include a mechanism for transferring the data from the cloud-environment to the on-premises environment where the payment processing application can access it.

Poe Says:
	`Adatum must automate the process of downloading the expense submission report data for input into the payments processing system.</td>`

Adatum has also evaluated the results from performance testing the application, and needs to implement a number of changes based on those results. For example, the developers discovered that constantly checking for the existence of a queue or table before accessing it was causing unnecessary processing overhead, and decided that the application should initialize storage requirements only once during startup, removing the need to check for the existence on each call that reads or writes data.

The developers at Adatum also explored whether they should implement a paging mechanism, for displaying expense items, and how they could improve performance by fine tuning the configuration and the Windows Communication Foundation (WCF) Data Service code.

Overview of the Solution

In this section you will see how the developers at Adatum considered the options available for meeting their goals in this stage of the migration process, and the decisions they made.

Why Use Azure Table Storage?

As you saw in Chapter 5, “Executing Background Tasks,” Adatum already uses Azure storage blobs for storing the scanned receipt images and Azure storage queues for transferring data between the web and worker roles. This functionality was added to the aExpense application during the migration step described in Chapter 5.

However, for storing data that is fundamentally relational in nature, such as the expense items currently stored in Azure SQL Database, the most appropriate Azure storage mechanism is tables. Azure tables provide a non-relational table-structured storage mechanism. Tables are collections of entities that do not have an enforced schema, which means a single table can contain entities that have different sets of properties.

Even though the underlying approach is different from a relational database table, because each row is an entity that contains a collection of properties rather than a set of data rows containing columns of predefined data types, Azure tables can provide an equivalent storage capability.

In Chapter 6, “Evaluating Cloud Hosting Costs,” of this guide you discovered that Azure table storage is less expensive per gigabyte stored than using Azure SQL Database. For example, in Adatum’s specific scenario, the running costs for the SQL Database are around $ 800.00 per year, which is 26% of the total cost. The calculated cost of the equivalent storage using Azure table storage is only around $ 25.00 per year, which is less than 1% of the total running costs. Therefore, it makes sense financially to consider moving to table storage, as long as the development and testing costs are not excessive and performance can be maintained.

Bharath Says:
	`Using Azure storage instead of a relational database can considerably reduce costs, but you must take into account the development effort required to modify an existing application, and evaluate whether it can provide the equivalent capabilities and performance in your own scenario. </td>`

Bharath Says:

                Using Azure storage instead of a relational database can considerably reduce costs, but you must take into account the development effort required to modify an existing application, and evaluate whether it can provide the equivalent capabilities and performance in your own scenario. </td>

In addition to the cost advantage, Azure tables also offer other useful capabilities. They can be used to store huge volumes of data (a single Azure storage account can hold up to 100 TB of data), and can be accessed using a managed API or directly using REST queries. You can use Shared Access Signatures to control access to tables, partitions, and rows.

In some circumstances table storage can also provide better scalability. The data is also protected through automatic geo-replication across multiple datacenters unless you disable this function (for example, if legal restrictions prevent data from being co-located in other regions).

Poe Says:
	`You can have Azure automatically replicate storage data across multiple data centers. SQL Database creates replicas within a datacenter.</td>`

Profile Data

By moving the expenses data from Azure SQL Database to Azure table storage, Adatum will be able to remove the dependency of the aExpense application on a relational database. The justification for using table storage assumes that Adatum will no longer need to pay for a cloud hosted SQL Server or Azure SQL Database.

However, when reviewing this decision, Adatum realized that the aExpense application still uses the ASP.NET profile provider, which stores user profile data in Azure SQL Database. Therefore Adatum must find an alternative method for storing profile data.

Adatum uses Azure Caching to store session data for users, but this is not suitable for storing profile data that must be persisted between user sessions. The developers at Adatum could write a custom profile provider that stores its data in Azure storage. However, after investigation, they decided to use the Azure ASP.NET Providers sample. This provider can be used to store membership, profile, roles, and session data in Azure tables and blobs.

The Data Export Process

There are three elements of the export process to consider: how to initiate the process, how to generate the data, and how to download the data from the cloud.

Initiating the Export Process

The simplest option for initiating the data export is to have a web page that returns the data on request, but there are some potential disadvantages to this approach. First, it adds to the web server's load and potentially affects the other users of the system. In the case of aExpense, this will probably not be significant because the computational requirements for producing the report are low. Second, if the process that generates the data is complex and the data volumes are high, the web page must be able to handle timeouts. Again, for aExpense, it is unlikely that this will be a significant problem.

The most significant drawback to this solution in aExpense is that the current storage architecture for expense submission data is optimized for updating and retrieving individual expense submissions by using the user ID. The export process will need to access expense submission data by date and expense state. Unlike Azure SQL Database where you can define multiple indexes on a table, Azure table storage only has a single index on each table.

Jana Says:
	`Choosing the right Partition Key and Row Key for your tables is crucial for the performance of your application. Any process that needs a “table scan” across all your partitions will be slow.</td>`

Jana Says:

                Choosing the right Partition Key and Row Key for your tables is crucial for the performance of your application. Any process that needs a “table scan” across all your partitions will be slow.</td>

Figure 2 illustrates the second option for initiating the data export. Each task has a dedicated worker role, so the image compression and thumbnail generation would be handled by Task 1 in Worker 1, and the data export would be performed by Task 2 in Worker 2. This would also be simple to implement, but in the case of aExpense where the export process will run twice a month, it's not worth the overhead of having a separate role instance. If your task ran more frequently and if it was computationally intensive, you might consider an additional worker role.

Figure 2 - Separate worker roles for each task

Figure 2

Separate worker roles for each task

Bharath Says:
	`You should try to use your compute nodes to the fullest. Remember, you pay for a deployed role instance whether or not it's doing any work. You can opt for a larger compute instance if you want to do more work in a single role instance.</td>`

Bharath Says:

                You should try to use your compute nodes to the fullest. Remember, you pay for a deployed role instance whether or not it's doing any work. You can opt for a larger compute instance if you want to do more work in a single role instance.</td>

Figure 3 illustrates the third option where an additional task inside an existing worker role performs the data export process. This approach makes use of existing compute resources and makes sense if the tasks are not too computationally intensive. At the present time, the Azure SDK does not include any task abstractions, so you need to either develop or find a framework to handle task-based processing for you. The team at Adatum will use the plumbing code classes described in Chapter 5, “Executing Background Tasks,” to define the tasks in the aExpense application. Designing and building this type of framework is not very difficult, but you do need to include all your own error handling and scheduling logic.

Figure 3 - Multiple tasks in a single worker role

Figure 3

Multiple tasks in a single worker role

Adatum already has some simple abstractions that enable them to run multiple tasks in a single worker role.

Markus Says:
	`Azure can only monitor at the level of a worker, and it tries to restart a failed worker if possible. If one of your task processing threads fails, it's up to you to handle the situation as described in Chapter 5. </td>`

Markus Says:

                Azure can only monitor at the level of a worker, and it tries to restart a failed worker if possible. If one of your task processing threads fails, it's up to you to handle the situation as described in Chapter 5. </td>

Generating the Export Data

The team at Adatum decided to split the expense report generation process into two steps. The first step “flattens” the data model and puts the data for export into a Azure table. This table uses the expense submission's approval date as the partition key, the expense ID as the row key, and it stores the expense submission total. The second step reads this table and generates a Azure blob that contains the data ready for export as a comma-separated values (CSV) file. Adatum implemented each of these two steps as a task by using the plumbing code described in Chapter 5, “Executing Background Tasks.” Figure 4 illustrates how the task that adds data to the Azure table works.

Figure 4 - Generating the Expense Report table

Figure 4

Generating the Expense Report table

First, a manager approves a business expense submission. This places a message that contains the expense submission's ID and approval date onto a queue (1), and updates the status of the submission in table storage (2). The task retrieves the message from the queue, calculates the total value of the expense submission from the expense detail items, and stores this as a single line in the Expense Export table. The task also updates the status of the expense submission to be "in process" before it deletes the message from the queue.

Bharath Says:
	`Azure table storage does not have all the features of a relational database, and so complex querying is more challenging. You may need multiple tables that present the same data in different ways based on the needs of the application. Table storage is cheap!</td>`

Bharath Says:

                Azure table storage does not have all the features of a relational database, and so complex querying is more challenging. You may need multiple tables that present the same data in different ways based on the needs of the application. Table storage is cheap!</td>

Exporting the Report Data

To export the data, Adatum considered two options. The first was to have a web page that enables a user to download the expense report data as a file. This page would query the expense report table by date and generate a CSV file that the payments processing system can import. Figure 5 illustrates this option.

Figure 5 - Downloading the expense report from a web page

Figure 5

Downloading the expense report from a web page

The second option, shown in Figure 6, was to create another job in the worker process that runs on a schedule to generate the file in blob storage ready for download. Adatum will modify the on-premises payment processing system to download this file before importing it. Adatum selected this option because it enables them to schedule the job to run at a quiet time in order to avoid any impact on the performance of the application for users. The on-premises application can access the blob storage directly without involving either the Azure web role or worker role.

Poe Says:
	`This approach makes it easier for us to automate the download and get the data in time for the payments processing run.</td>`

Figure 6 - Generating the expense report in blob storage

Figure 6

Generating the expense report in blob storage

Adatum had to modify slightly the worker role plumbing code to support this process. In the original version of the plumbing code, a message in a queue triggered a task to run, but the application now also requires the ability to schedule tasks.

Markus Says:
	`We had to modify our plumbing code classes slightly to accommodate scheduled tasks.</td>`

Inside the Implementation

Now is a good time to walk through these changes in more detail. As you go through this section, you may want to download the Visual Studio solution from https://wag.codeplex.com/. This solution (in the Azure-TableStorage folder) contains the implementation of aExpense after the changes made in this phase. If you are not interested in the mechanics, you should skip to the next section.

The Hands-on Labs that accompany this guide provide a step-by-step walkthrough of parts of the implementation tasks Adatum carried out on the aExpense application at this stage of the migration process.

Storing Business Expense Data in Azure Table Storage

Moving from Azure SQL Database to Azure table storage meant that the developers at Adatum had to re-implement the data access layer (DAL) in the application. The original version of aExpense used LINQ to SQL as the technology in the data access layer to communicate with Azure SQL Database. The DAL converted the data that it retrieved using LINQ to SQL to a set of domain-model objects that it passed to the user interface (UI).

Jana Says:
	`Keeping all your data access code in a data access layer restricts the scope of the code changes required if you need to change your storage solution (the code changes take place only in the DAL).</td>`

Jana Says:

                Keeping all your data access code in a data access layer restricts the scope of the code changes required if you need to change your storage solution (the code changes take place only in the DAL).</td>

The new version of aExpense that uses Azure table storage uses the managed Azure storage client to interact with Azure table storage. Because Azure table storage uses a fundamentally different approach to storage, this was not simply a case of replacing LINQ to SQL with the .NET Client Library.

Markus Says:
	`Use the Azure storage client in the Azure Managed Library to access Azure table storage. Note that the Azure table service only supports a subset of the functionality defined by the .NET Client Library for WCF Data Services. You can find more details in the topic "<a href="https://msdn.microsoft.com/en-us/library/dd894032.aspx">Table Service Support for .NET Client Library Constructs</a>."</td>`

Markus Says:

                Use the Azure storage client in the Azure Managed Library to access Azure table storage. Note that the Azure table service only supports a subset of the functionality defined by the .NET Client Library for WCF Data Services. You can find more details in the topic "<a href="https://msdn.microsoft.com/en-us/library/dd894032.aspx">Table Service Support for .NET Client Library Constructs</a>."</td>

How Many Tables?

The most important thing to understand when transitioning to Azure table storage is that the storage model is different from what you may be used to. In the relational world, the obvious data model for aExpense would have two tables, one for expense header entities and one for expense detail entities, with a foreign-key constraint to enforce data integrity. This reflects the schema that Adatum used in SQL Server and Azure SQL Database in previous steps of the migration process.

Bharath Says:
	`You have to stop thinking in relational terms when you are dealing with Azure table storage.</td>`

However, the best data model to use is not so obvious with Azure table storage for a number of reasons:

You can store multiple entity types in a single table in Azure.
Entity Group Transactions are limited to a single partition in a single table (partitions are discussed in more detail later in this chapter).
Azure table storage is relatively cheap, so you shouldn't be so concerned about normalizing your data and eliminating data redundancy.

Bharath Says:
	`You can use an Entity Group Transaction to group multiple table data modification operations on entities in the same table and partition group into a single, atomic transaction.</td>`

Adatum could have used two Azure storage tables to store the expense header and expense detail entities. The advantage of this approach is simplicity because each table has its own, separate, schema. However, because transactions cannot span tables in Azure storage, there is a possibility that orphaned detail records could be left if there was a failure before the aExpense application saved the header record.

For example, the developers would need to use two transactions to save an expense if Adatum had used two separate tables. The following code sample shows the outline of the SaveExpense method that would be required in the ExpenseRepository class — each call to the SaveChanges method is a separate transaction, one of which may fail leading to the risk of orphaned detail records.

// Example code when using two tables for expenses data.
public void SaveExpense(Expense expense)
{
  // create an expense row.
  var context = new ExpenseDataContext(this.account);
  ExpenseRow expenseRow = expense.ToTableEntity();
 
  foreach (var expenseItem in expense.Details)
  {
    // create an expense item row.
    var expenseItemRow = expenseItem.ToTableEntity();
    expenseItemRow.PartitionKey = expenseRow.PartitionKey;
    expenseItemRow.RowKey = 
      string.Format(CultureInfo.InvariantCulture, "{0}_{1}", 
                    expense.Id, expenseItemRow.Id);

    context.AddObject(ExpenseDataContext.ExpenseItemTable,                        expenseItemRow);
    ...
  }
 
  // save the expense item rows.  context.SaveChanges(SaveChangesOptions.Batch);
 
  // save the expense row.  context.AddObject(ExpenseDataContext.ExpenseTable,       expenseRow);  context.SaveChanges();
  ...
}

To resolve this situation the developers would need to write code that implements a compensating transaction mechanism so that a failure when saving a header or detail row does not affect the integrity of the data. This is possible, but adds to the complexity of the solution. For example, to resolve the potential issue of orphaned detail records after a failure, the developers could implement an “orphan collector” process that will regularly scan the details table looking for, and deleting, orphaned records.

However, because the developers at Adatum chose to implement a multi-schema table for expense data, they can use a single transaction for saving both header and detail records. This approach enables them to use Entity Group Transactions to save an expense header entity and its related detail entities to a single partition in a single, atomic transaction. The following code sample from the ExpenseRepository class shows how the application saves an expense to table storage.

// Actual code used to save expenses data from a single table.
public void SaveExpense(Expense expense)
{
  var context = new ExpenseDataContext(this.account);
  IExpenseRow expenseRow = expense.ToTableEntity();
  expenseRow.PartitionKey = ExpenseRepository
      .EncodePartitionAndRowKey(expenseRow.UserName);
  expenseRow.RowKey = expense.Id.ToString();

  context.AddObject(ExpenseDataContext.ExpenseTable,       expenseRow);

  foreach (var expenseItem in expense.Details)
  {
    // Create an expense item row.
    var expenseItemRow = expenseItem.ToTableEntity();
    expenseItemRow.PartitionKey = expenseRow.PartitionKey;
    expenseItemRow.RowKey = string.Format(
        CultureInfo.InvariantCulture, "{0}_{1}", expense.Id, 
        expenseItemRow.ItemId);
    context.AddObject(ExpenseDataContext.ExpenseTable,         expenseItemRow);

    // save receipt image if any
    if (expenseItem.Receipt != null 
    && expenseItem.Receipt.Length > 0)
    {
      this.receiptStorage.AddReceipt(
           expenseItemRow.ItemId.ToString(),
           expenseItem.Receipt, string.Empty);
    }
  }

  // Save expense and the expense items row in the same  // batch transaction using a retry policy.  this.storageRetryPolicy.ExecuteAction(    () => context.SaveChanges(SaveChangesOptions.Batch);
  ...
}

Markus Says:
	Notice how the two overloaded versions of the ToTableEntity extension method return either an IExpenseRow or an IExpenseItemRow instance. The data access layer code is in the DataAccessLayer folder of the aExpense.Shared project in the example solution.

You can also see in the second example how Adatum chose to use the Enterprise Library Transient Fault Handling Application Block to retry the SaveChanges operation if it fails due to a temporary connectivity. The Azure storage client API includes support for custom retry policies, but Adatum uses the Transient Fault Handling Application Block to take advantage of its customization capabilities and to implement a standard approach to all the retry logic in the application. See Chapter 4, “Moving to Azure SQL Database,” for information about using the Transient Fault Handling Application Block.

There are some additional restrictions on performing Entity Group Transactions: each entity can appear only once in the transaction, there must be no more than 100 entities in the transaction, and the total size of the request payload must not exceed 4 megabytes (MB). Adatum assumes that no one will submit more than 100 business expense items as part of a single submission, but will consider adding some additional validation to the application’s code to prevent this.

Partition Keys and Row Keys

The second important decision about table storage is the selection of keys to use. Azure table storage uses two keys: a partition key and a row key. Azure uses the partition key to implement load balancing across storage nodes. The load balancer can identify “hot” partitions (partitions that contain data that is accessed more frequently than the data in other partitions) and run them on separate storage nodes in order to improve performance. This has deep implications for your data model design and your choice of partition keys:

The partition key forms the first part of the tuple that uniquely identifies an entity in table storage. The row key is a unique identifier for an entity within a partition and forms the second part of the tuple that uniquely identifies an entity in table storage.

You can only use Entity Group Transactions on entities in the same table and in the same partition. You may want to choose a partition key based on the transactional requirements of your application. Don't forget that a table can store multiple entity types.

Bharath Says:
	Each entry in a table is simply a property bag. Each property bag can represent a different entity type; this means that a single partition can hold multiple entities of the same or different types.

- You can optimize queries based on your knowledge of partition keys. For example, if you know that all the entities you want to retrieve are located on the same partition, you can include the partition key in the **where** clause of the query. In a single query, accessing multiple entities from the same partition is much faster than accessing multiple entities on different partitions. If the entities you want to retrieve span multiple partitions, you can split your query into multiple queries and execute them in parallel across the different partitions.

> [!NOTE]
> If you want to create parallel queries, you should plan to use Parallel LINQ (PLINQ) instead of creating your own threads in the web role.<BR>

Jana Says:
	Choosing the right partition key is the most important decision you make that affects the performance of your storage solution. The partition key and row key together make up a tuple that uniquely identifies any entity in table storage.

Adatum determined that reverse chronological order is the most likely order in which the expense items will be accessed because users are typically interested in the most recent expenses. Therefore, it decided to use a row key that guarantees the expense items are stored in this order to avoid the need to sort them.

The following code sample from the ExpenseKey class shows how the static Now property generates an inverted tick count to use in its InvertedTicks property.

public static ExpenseKey Now
{
  get
  {
    return new ExpenseKey(string.Format("{0:D19}", 
        DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks));
  }
}

For the partition key, Adatum decided to use the UserName property because the vast majority of queries will filter based on a user name. For example, the website displays the expense submissions that belong to the logged on user.

This also enables the application to filter expense item rows by ExpenseID as if there was a foreign key relationship. The following code in the SaveChanges method in the ExpenseRepository class shows how the application creates this row key value for an expense item entity from the Id property of the expense header entity and the Id property of the expense item entity.

expenseItemRow.RowKey = string.Format(
               CultureInfo.InvariantCulture, 
                 "{0}_{1}", expense.Id, expenseItemRow.Id);

The following code example shows how you could query for ExpenseItem rows based on ExpenseID by including the partition key in the query.

char charAfterSeparator = 
       Convert.ToChar((Convert.ToInt32('_') + 1));
var nextId = expenseId.ToString() + charAfterSeparator;

var expenseItemQuery = 
  (from expenseItem in context.ExpensesAndExpenseItems
   where
     expenseItem.RowKey.CompareTo(expenseId.ToString()) >= 0 &&
     expenseItem.RowKey.CompareTo(nextId) < 0 &&
     expenseItem.PartitionKey.CompareTo(expenseRow.PartitionKey)
       == 0
   select expenseItem).AsTableServiceQuery();

Markus Says:
	A more natural way of writing this query would be to use StartsWith instead of CompareTo. However, StartsWith is not supported by the Azure table service. You also get performance benefits from this query because the where clause includes the partition key.

Azure places some restrictions on the characters that you can use in partition and row keys. Generally speaking, the restricted characters are ones that are meaningful in a URL. For more information, see “Understanding the Table Service Data Model.” In the aExpense application, it's possible that these illegal characters could appear in the UserName used as the partition key value for the Expense table.

Note

If there is an illegal character in your partition key, Azure will return a Bad Request (400) message.

To avoid this problem, the aExpense application encodes the UserName value using a base64 encoding scheme before using the UserName value as a row key. Implementing base64 encoding and decoding is very easy.

public static string EncodePartitionAndRowKey(string key)
{
  if (key == null)
  {
    return null;
  }
  return Convert.ToBase64String(
      System.Text.Encoding.UTF8.GetBytes(key));
}

public static string DecodePartitionAndRowKey(string encodedKey)
{
  if (encodedKey == null)
  {
    return null;
  }
  return System.Text.Encoding.UTF8.GetString(
      Convert.FromBase64String(encodedKey));
}

The team at Adatum first tried to use the UrlEncode method because it would have produced a more human readable encoding, but this approach failed because it does not encode the percent sign (%) character.

Note

According to the documentation, the percent sign character is not an illegal character in a key, but Adatum's testing showed that entities with a percent sign character in the key could not be deleted.

Another approach would be to implement a custom escaping technique.

Markus Says:
	A custom method to transform the user name to a legal character sequence could leave the keys human-readable, which would be useful during debugging or troubleshooting.

Defining the Schemas

In the aExpense application, two types of entity are stored in the Expense table: expense header entities (defined by the IExpenseRow interface) and expense detail entities (defined by the IExpenseItemRow interface). The following code sample shows these two interfaces and the IRow interface that defines the entity key.

Markus Says:
	Both entity types share the same key structure defined in the IRow interface.

public interface IExpenseRow : IRow
{
  // NOTE: DateTime bool and Guid types must be Nullable 
  // in order to run in the storage emulator.
  string Id { get; set; }
  string UserName { get; set; }
  bool? Approved { get; set; }
  string ApproverName { get; set; }
  string CostCenter { get; set; }
  DateTime? Date { get; set; }
  string ReimbursementMethod { get; set; }
  string Title { get; set; }
}

public interface IExpenseItemRow : IRow
{
  Guid? ItemId { get; set; }
  string Description { get; set; }
  double? Amount { get; set; }
  string ReceiptUrl { get; set; }
  string ReceiptThumbnailUrl { get; set; }
}

public interface IRow
{
  string PartitionKey { get; set; }
  string RowKey { get; set; }
  DateTime Timestamp { get; set; }
  string Kind { get; set; }
}

Adatum had to make a change to the data type that the application uses to store the business expense amount. In Azure SQL Database, this field was stored as a decimal. This data type is not supported in Azure table storage and the amount is now stored as a double.

Adatum uses the ExpenseAndExpenseItemRow and Row classes to implement the IRow, IExpenseRow, and IExpenseItemRow interfaces, and to extend the TableServiceEntity class from the StorageClient namespace. The following code sample shows the Row and ExpenseAndExpenseItemRow classes. The Row class defines a Kind property that is used to distinguish between the two types of entity stored in the table (see the TableRows enumeration in the DataAccessLayer folder of the aExpense.Shared project).

public abstract class Row : TableServiceEntity, IRow
{
  protected Row()
  { }

  protected Row(string kind) : this(null, null, kind)
  { }

  protected Row(
      string partitionKey, string rowKey, string kind)
      : base(partitionKey, rowKey)
  {
    this.Kind = kind;
  }

  public string Kind { get; set; }
}

public class ExpenseAndExpenseItemRow 
    : Row, IExpenseRow, IExpenseItemRow
{
  public ExpenseAndExpenseItemRow()
  { }

  public ExpenseAndExpenseItemRow(TableRows rowKind)
  {
    this.Kind = rowKind.ToString();
  }

  // Properties from ExpenseRow
  public string Id { get; set; }
  public string UserName { get; set; }
  public bool? Approved { get; set; }
  public string ApproverName { get; set; }
  public string CostCenter { get; set; }
  public DateTime? Date { get; set; }
  public string ReimbursementMethod { get; set; }
  public string Title { get; set; }

  // Properties from ExpenseItemRow
  public Guid? ItemId { get; set; }
  public string Description { get; set; }
  public double? Amount { get; set; }
  public string ReceiptUrl { get; set; }
  public string ReceiptThumbnailUrl { get; set; }
}

The following code example shows how the ExpenseDataContext class maps the ExpenseAndExpenseItemRow class to a Azure storage table named multientityschemaexpenses.

public class ExpenseDataContext : TableServiceContext
{
  public const string ExpenseTable = "multientityschemaexpenses";
    
  ...

  public IQueryable<ExpenseAndExpenseItemRow> 
         ExpensesAndExpenseItems
  {
    get
    {
      return this.CreateQuery<ExpenseAndExpenseItemRow>(
             ExpenseTable);
     }
  }

  ...
}

Retrieving Records from a Multi-Entity Schema Table

Storing multiple entity types in the same table does add to the complexity of the application. The aExpense application uses LINQ to specify what data to retrieve from table storage. The following code example shows how the application retrieves expense submissions for approval by approver name.

Use the AsTableServiceQuery method to return data from Azure table storage.

var query = (from expense in context.ExpensesAndExpenseItems
      where expense.ApproverName.CompareTo(approverName) == 0
    select expense).AsTableServiceQuery();
return this.storageRetryPolicy.ExecuteAction(
            () => query.Execute()).SingleOrDefault();

The AsTableServiceQuery method converts the standard IQueryable result to a CloudTableQuery result. Using a CloudTableQuery object offers the following benefits to the application:

Data can be retrieved from the table in multiple segments instead of getting it all in one go. This is useful when dealing with a large set of data.
You can specify a retry policy for cases when the query fails. However, as you saw earlier, Adatum chose to use the Transient Fault Handling Block instead.

The query methods in the ExpenseRepository class use the ExpenseAndExpenseItemRow entity class when they retrieve either header or detail entities from the expense table. The following code example from the GetExpensesByUser method in the ExpenseRespository class shows how to retrieve a header row (defined by the IExpenseRow interface).

var context = new ExpenseDataContext(this.account) 
              { MergeOption = MergeOption.NoTracking };

var query = (from expense in context.ExpensesAndExpenseItems
             where
               expense.UserName.CompareTo(userName) == 0 &&
               expense.PartitionKey.CompareTo(
                 EncodePartitionAndRowKey(userName)) == 0
             select expense).Take(10).AsTableServiceQuery();

try
{
  return this.storageRetryPolicy.ExecuteAction(
     () => query.Execute()).Select(e => e.ToModel()).ToList();
}
...

Markus Says:
	The use of the Take method is not intended as a paging mechanism. It is included in order to improve the performance of the code. Using the partition key in the query improves the performance because the partition key is indexed. This example does not need to use the Kind property because only header entities have a UserName property.

The following code sample from the GetExpensesById method in the ExpenseRepository class uses the Kind property to select only detail entities.

var expenseAndItemRows = query.Execute().ToList();
...
expenseAndItemRows.
    Where(e => e.Kind == TableRows.ExpenseItem.ToString()).
    Select(e => (e as IExpenseItemRow).ToModel()).
    ToList().ForEach(e => expense.Details.Add(e));

Materializing Entities

In the aExpense application, all the methods in the ExpenseRepository class that return data from queries call the ToList method before returning the results to the caller.

Jana Says:
	Try to handle all data access issues within your data access layer.

public IEnumerable<Expense> GetExpensesForApproval(string approverName)
{
  ExpenseDataContext context = new 
        ExpenseDataContext(this.account);

  var query = (from expense in context.ExpensesAndExpenseItems
           where
             expense.ApproverName.CompareTo(approverName) == 0
           select expense).AsTableServiceQuery();

  try
  {
    return this.storageRetryPolicy.ExecuteAction(() => 
          query.Execute()).Select(e => e.ToModel()).ToList();
  }
  catch (InvalidOperationException)
  {
    Log.Write(EventKind.Error, 
      "By calling ToList(), this exception can be handled 
       inside the repository.");
    throw;
  }
}

The reason for this is that calling the Execute method does not materialize the entities. Materialization does not happen until someone calls MoveNext on the IEnumerable collection. Without ToList, the first call to MoveNext happens outside the repository. The advantage of having the first call to the MoveNext method inside the ExpenseRepository class is that you can handle any data access exceptions inside the repository.

Query Performance

As mentioned earlier, the choice of partition key can have a big impact on the performance of the application. This is because Azure tracks activity at the partition level, and can automatically migrate a busy partition to a separate storage node in order to improve data access performance for the application.

Adatum uses partition keys in queries to improve the performance. For example, the following query to retrieve stored business expense submissions for a user by using this query would work, even though it does not specify a partition key.

var query = (from expense in context.ExpensesAndExpenseItems
             where 
               expense.UserName.CompareTo(userName) == 0
             select expense).AsTableServiceQuery();

However, this query must scan all the partitions of the table to search for matching records. This is inefficient if there are a large number of records to search, and its performance may be further affected if it has to scan data across multiple storage nodes sequentially.

Jana Says:
	It's important to understand the impact that partitions can have on query performance.

Adatum’s test team did performance testing on the application using queries that do not include the partition key, and then evaluated the improvement when the partition key is included in the where clause. The testers found that there was a significant performance improvement in the aExpense application using a query that includes the partition key, as shown here.

var query = (from expense in context.ExpensesAndExpenseItems
             where
               expense.UserName.CompareTo(userName) == 0 
               && expense.PartitionKey.CompareTo(
                  EncodePartitionAndRowKey(userName)) == 0
             select expense).Take(10).AsTableServiceQuery();

Note

If a table query does not include the partition key in its where clause, you should re-evaluate your choice of row key and partition key for the table to avoid the potential performance problems associated with scanning multiple partitions.

Working with Development Storage

There are some differences between development table storage and Azure table storage documented at “Differences Between the Storage Emulator and Azure Storage Services.” The team at Adatum encountered the error “One of the request inputs is not valid” that occurs when testing the application with empty tables in development storage.

The solution that Adatum adopted was to insert, and then delete, a dummy row into the Azure tables if the application is using the local storage emulator. During the initialization of the web role, the application calls the CreateTableIfNotExist<T> extension method in the TableStorageExtensionMethods class to check whether it is running against local development storage. If this is the case it adds and then deletes a dummy record in the application's Azure tables.

Markus Says:
	Don't assume that local development storage will work in exactly the same way as Azure storage. You should consider adding dummy records to all tables in local development storage.

The following code from the TableStorageExtensionMethods class (defined in the Source\Shared\aExpense folder) demonstrates how the aExpense application determines whether it is using development storage and how it adds and deletes a dummy record to the table.

public static bool CreateTableIfNotExist<T>(
    this CloudTableClient tableStorage, string entityName)
    where T : TableServiceEntity, new()
{
  bool result = tableStorage.CreateTableIfNotExist(entityName);

  // Execute conditionally for development storage only
  if (tableStorage.BaseUri.IsLoopback)
  {
    InitializeTableSchemaFromEntity(tableStorage,
          entityName, new T());
  }
  return result;
}

private static void InitializeTableSchemaFromEntity(
    CloudTableClient tableStorage, string entityName,
    TableServiceEntity entity)
{
  TableServiceContext context =
        tableStorage.GetDataServiceContext();
  DateTime now = DateTime.UtcNow;
  entity.PartitionKey = Guid.NewGuid().ToString();
  entity.RowKey = Guid.NewGuid().ToString();
  Array.ForEach(
    entity.GetType().GetProperties(BindingFlags.Public | 
    BindingFlags.Instance),
    p =>
    {
      if ((p.Name != "PartitionKey") &&
          (p.Name != "RowKey") && (p.Name != "Timestamp"))
      {
        if (p.PropertyType == typeof(string))
        {
          p.SetValue(entity, Guid.NewGuid().ToString(), null);
        }
        else if (p.PropertyType == typeof(DateTime))
        {
          p.SetValue(entity, now, null);
        }
      }
    });

  context.AddObject(entityName, entity);
  context.SaveChangesWithRetries();
  context.DeleteObject(entity);
  context.SaveChangesWithRetries();
}

Storing Profile Data

Until now Adatum has used the built-in ASP.NET profile mechanism to store each user’s preferred reimbursement method. In Azure, the ASP.NET profile provider communicates with either SQL Server or Azure SQL Database (depending on the previous migration stage) where the ASPNETDB database resides. However, during this final migration step Adatum will move away from using a relational database in favor of storing all of the application data in Azure table and blob storage. Therefore it makes no sense to continue to use a relational database just for the profile data.

Instead, Adatum chose to use a sample provider that utilizes Azure table storage to store profile information. You can download this provider from “Azure ASP.NET Providers Sample.” The only change required for the application to use a different profile provider is in the Web.config file.

Markus Says:
	Using a profile provider to access profile data minimizes the code changes required in the application.

<profile defaultProvider="TableStorageProfileProvider">
  <providers>
    <clear />
    <add name="TableStorageProfileProvider"
         type="AExpense.Providers.TableStorageProfileProvider …"
         applicationName="aExpenseProfiles" />
  </providers>

  <properties>
    <add name="PreferredReimbursementMethod" />
  </properties>
</profile>

Using the TableStorageProfileProvider class does raise some issues for the application:

The table storage profile provider is unsupported sample code.
You must migrate your existing profile data from SQL Server to Azure table storage.

Note

In the example provided for this guide, the table is populated by the Initialize method of the ProfileInitializer class defined in the WebRole class, which is executed when the application starts. In a real application, users would have the ability to set their own preferences.
Adatum should also migrate users existing preferences from the SQL Database tables to Azure table storage.
You need to consider whether, in the long run, Azure table storage is suitable for storing profile data.

Even with these considerations to taken into account, using the table storage profile provider enabled Adatum to get rid of the need for a relational database; which helps to minimize the running costs of the application.

Note

Chapter 6, “Evaluating Cloud Hosting Costs,” describes the relative costs of using Azure storage and Azure SQL Database.

Generating and Exporting the Expense Data

The developers at Adatum added functionality to the aExpense application to export a summary of the approved expenses data to use with Adatum’s existing on-premises reimbursement system.

Generating the Expense Report Table

The task that performs this operation uses the worker role plumbing code described in Chapter 5, “Executing Background Tasks.” The discussion here will focus on the task implementation and table design issues; it does not focus on the plumbing code.

Markus Says:
	For this task, we were able to use our worker role plumbing code without modification.

This task is the first of two tasks that generate the approved expense data for export. It is responsible for generating the "flattened" table of approved expense data in Azure table storage. The following code sample shows how the expense report export process begins in the ExpenseRepository class (in the DataAccessLayer folder of the aExpense.Shared project) where the UpdateApproved method adds a message to a queue and updates the Approved property of the expense header record.

public void UpdateApproved(Expense expense)
{
  var context = new ExpenseDataContext(this.account);

  ExpenseRow expenseRow =
      GetExpenseRowById(context, expense.Id);
  expenseRow.Approved = expense.Approved;

  var queue = new AzureQueueContext(this.account);
  this.storageRetryPolicy.ExecuteAction(
    () => queue.AddMessage(new ApprovedExpenseMessage {
                  ExpenseId = expense.Id.ToString(), 
                  ApproveDate = DateTime.UtcNow }));

  context.UpdateObject(expenseRow);
  this.storageRetryPolicy.ExecuteAction(
    () => context.SaveChanges());
}

This code uses a new message type named ApprovedExpenseMessage that derives from the plumbing code class named BaseQueueMessage. The following code sample shows the two properties of the ApprovedExpenseMessage class.

[DataContract]
public class ApprovedExpenseMessage : BaseQueueMessage
{
  [DataMember]
  public string ExpenseId { get; set; }

  [DataMember]
  public DateTime ApproveDate { get; set; }
}

The following code shows how the ProcessMessage method in the ExpenseExportJob class (located in the Jobs folder of the aExpense.Workers project) retrieves the message from the queue and creates a new ExpenseExport entity to save to table storage.

Jana Says:
	We “flatten” the data and calculate the expense submission total before saving the data into an intermediate table. This table contains the data structured in exactly the format we need to export.

public override bool ProcessMessage(
    ApprovedExpenseMessage message)
{
  try
  {
    Expense expense = this.expenses.GetExpenseById(
            new ExpenseKey(message.ExpenseId));

    if (expense == null)
    {
      return false;
    }

    // If the expense was not updated but a message was 
    // persisted, we need to delete it.
    if (!expense.Approved)
    {
      return true;
    }

    double totalToPay = expense.Details.Sum(x => x.Amount);
    var export = new ExpenseExport
      {
        ApproveDate = message.ApproveDate,
        ApproverName = expense.ApproverName,
        CostCenter = expense.CostCenter,
        ExpenseId = expense.Id,
        ReimbursementMethod = expense.ReimbursementMethod,
        TotalAmount = totalToPay,
        UserName = expense.User.UserName
      };
    this.expenseExports.Save(export);
  }
  catch (InvalidOperationException ex)
  {
    var innerEx =
          ex.InnerException as DataServiceClientException;
    if (innerEx != null &&
        innerEx.StatusCode == (int)HttpStatusCode.Conflict)
    {
      // The data already exists, so we can return true 
      // because we have processed this before.
      return true;
    }
    Log.Write(EventKind.Error, ex.TraceInformation());
    return false;
  }

  return true;
}

If this method fails for any reason other than a conflict on the insert, the plumbing code classes ensure that message is left on the queue. When the ProcessMessage method tries to process the message from the queue a second time, the insert to the expense report table fails with a duplicate key error and the inner exception reports this as a conflict in its StatusCode property. If this happens, the method can safely return a true result.

Jana Says:
	We need to ensure that this process is robust. We don't want to lose any expense submissions, or pay anyone twice.

If the Approved property of the Expense object is false, this indicates a failure during the UpdateApproved method after it added a message to the queue, but before it updated the table. In this circumstance, the ProcessMessage method removes the message from the queue without processing it.

The partition key of the Expense Export table is the expense approval date, and the row key is the expense ID. This optimizes access to this data for queries that use the approval date in the where clause, which is what the export process requires.

Bharath Says:
	Choose partition keys and rows keys to optimize your queries against the data. Ideally, you should be able to include the partition key in the where block of the query.

Exporting the Expenses Data

This task is the second of two tasks that generate the approved expense data for export. It is responsible for creating a Azure blob that contains a CSV file of approved expense submissions data.

The task that generates the blob containing the expense report data is slightly different from the two other tasks in the aExpense application. The other tasks poll a queue to see if there is any work for them to do. The export task is triggered by a schedule, which sets the task to run at fixed times. The team at Adatum had to modify their worker role plumbing code classes to support scheduled tasks.

The worker role plumbing code classes now support scheduled tasks in addition to tasks that are triggered by a message on a queue.

Markus Says:
	We could extend the application to enable an on-premises application to generate an ad-hoc expense data report by allowing an on-premises application to place a message onto a Azure queue. We could then have a task that generated the report data when it received a message on the queue.

You can use the abstract class JobProcessor, which implements the IJobProcessor interface, to define new scheduled tasks. The following code example shows the JobProcessor class.

public abstract class JobProcessor : IJobProcessor
{
  private bool keepRunning;

  protected JobProcessor(int sleepInterval)
  {
    if (sleepInterval <= 0)
    {
      throw new ArgumentOutOfRangeException("sleepInterval");
    }

    this.SleepInterval = sleepInterval;
  }

  protected int SleepInterval { get; set; }

  public void Run()
  {
    this.keepRunning = true;
    while (this.keepRunning)
    {
      Thread.Sleep(this.SleepInterval);
      this.RunCore();
    }
  }

  public void Stop()
  {
    this.keepRunning = false;
  }

  protected abstract void RunCore();
}

This implementation does not make it easy to specify the exact time that scheduled tasks will run. The time between tasks will be the value of the sleep interval, plus the time taken to run the task. If you need the task to run at a fixed time, you should measure how long the task takes to run and subtract that value from the sleep interval.

Note

The BaseJobProcessor class that defines tasks that read messages from queues extends the JobProcessor class.

In the aExpense application, the ExpenseExportBuilderJob class extends the JobProcessor class to define a scheduled task. The ExpenseExportBuilderJob class, shown in the following code example, defines the task that generates the expense report data and stores it as a blob. In this class, the expenseExports variable refers to the table of approved expense submissions, and the exportStorage variable refers to the report data in blob storage that will be downloaded. The call to the base class constructor specifies the interval at which the job runs.

Note

The following code sets the scheduled interval to a low number for testing and demonstration purposes. You should change this interval for a "real" schedule.

public class ExpenseExportBuilderJob : JobProcessor
{
  private readonly ExpenseExportRepository expenseExports;
  private readonly ExpenseExportStorage exportStorage;

  public ExpenseExportBuilderJob() : base(100000)
  {
    this.expenseExports = new ExpenseExportRepository();
    this.exportStorage = new ExpenseExportStorage();
  }

In the RunCore method, the code first retrieves all the approved expense submissions from the export table based on the job date. Next, the code appends a CSV record to the export data in blob storage for each approved expense submission. Finally, the code deletes from the table all the records it copied to blob storage.

  protected override void RunCore()
  {
    DateTime jobDate = DateTime.UtcNow;
    string name = jobDate.ToExpenseExportKey();

    IEnumerable<ExpenseExport> exports = 
           this.expenseExports.Retreive(jobDate);
    if (exports == null || exports.Count() == 0)
    {
      return;   
    }

    string text = this.exportStorage.GetExport(name);
    var exportText = new StringBuilder(text);
    foreach (ExpenseExport expenseExport in exports)
    {
      exportText.AppendLine(expenseExport.ToCsvLine());
    }

    this.exportStorage.AddExport(name,
         exportText.ToString(), "text/plain");

    // Delete the exports.
    foreach (ExpenseExport exportToDelete in exports)
    {
      try
      {
        this.expenseExports.Delete(exportToDelete);
      }
      catch (InvalidOperationException ex)
      {
        Log.Write(EventKind.Error, ex.TraceInformation());
      }    
    }
  }
}

If the process fails before it deletes all the approved expense submissions from the export table, any undeleted approved expense submissions will be exported a second time when the task next runs. However, the exported CSV data includes the expense ID and the approval date of the expense submission, so the on-premises payment processing system will be able to identify duplicate items.

The following code shows the methods that the RunCore method invokes to retrieve approved expense submissions and delete them after it copies them to the export blob. These methods are defined in the ExpenseExportRepoisitory class located in the DataAccessLayer folder of the aExpense.Shared project. Because they use the job date to identify the partitions to search, these queries are fast and efficient.

public IEnumerable<ExpenseExport> Retreive(DateTime jobDate)
{
  var context = new ExpenseDataContext(this.account);
  string compareDate = jobDate.ToExpenseExportKey();
  var query = (from export in context.ExpenseExport
           where export.PartitionKey.CompareTo(compareDate) <= 0
           select export).AsTableServiceQuery();

  var val = query.Execute();
  return val.Select(e => e.ToModel()).ToList();
}

public void Delete(ExpenseExport expenseExport)
{
  var context = new ExpenseDataContext(this.account);
  var query = (from export in context.ExpenseExport
      where export.PartitionKey.CompareTo(
        expenseExport.ApproveDate.ToExpenseExportKey()) == 0 &&
        export.RowKey.CompareTo(
        expenseExport.ExpenseId.ToString()) == 0
      select export).AsTableServiceQuery();
  ExpenseExportRow row = query.Execute().SingleOrDefault();
  if (row == null)
  {
    return;
  }

  context.DeleteObject(row);
  context.SaveChanges();
}

Performance Testing, Tuning, To-Do Items

As part of the work for this phase, the team at Adatum evaluated the results from performance testing the application and, as a result, made a number of changes to the aExpense application. They also documented some of the key “missing pieces” in the application that Adatum should address in the next phase of the project.

Adatum made changes to the aExpense application following performance testing.

Initializing the Storage Tables, Blobs, and Queues

During testing of the application, the team at Adatum discovered that the code that creates the expenses storage repository and the job that processes receipt images were affecting performance. They isolated this to the fact that the code calls the CreateIfNotExist method every time the repository is instantiated, which requires a round-trip to the storage server to check whether the receipt container exists. This also incurs an unnecessary storage transaction cost. To resolve this, the developers realized that they should create the receipt container only once when the application starts.

Originally, the constructor for the ExpenseReceiptStorage class was responsible for checking that the expense receipt container existed, and creating it if necessary. This constructor is invoked whenever the application instantiates an ExpenseRepository object or a ReceiptThumbnailJob object. The CreateIfNotExist method that checks whether a container exists requires a round-trip to the storage server and incurs a storage transaction cost.

To avoid these unnecessary round-trips, Adatum moved this logic to the ApplicationStorageInitializer class defined in the WebRole class. This class prepares all of the tables, blobs, and queues required by the application when the role first starts.

public static class ApplicationStorageInitializer
{
  public static void Initialize()
  {
    CloudStorageAccount account =
      CloudConfiguration.GetStorageAccount(
                               "DataConnectionString");

    // Tables – create if they do not already exist.
    var cloudTableClient = 
      new CloudTableClient(account.TableEndpoint.ToString(),
                           account.Credentials);
      cloudTableClient.CreateTableIfNotExist<
                       ExpenseAndExpenseItemRow>(
                            ExpenseDataContext.ExpenseTable);
      cloudTableClient.CreateTableIfNotExist<ExpenseExportRow>(
                       ExpenseDataContext.ExpenseExportTable);

    // Blobs – create if they do not already exist.
    var client = account.CreateCloudBlobClient();
    client.RetryPolicy = RetryPolicies.Retry(3,
                              TimeSpan.FromSeconds(5));
    var container = client.GetContainerReference(
                    ExpenseReceiptStorage.ReceiptContainerName);
    container.CreateIfNotExist();
    container = client.GetContainerReference(
              ExpenseExportStorage.ExpenseExportContainerName);
    container.CreateIfNotExist();

    // Queues – remove any existing stored messages
    var queueContext = new AzureQueueContext(account);
    queueContext.Purge<NewReceiptMessage>();
    queueContext.Purge<ApprovedExpenseMessage>();
  }
}

The Application_Start method in the Global.asax.cs file and the OnStart method of the worker role invoke the Initialize method in this class.

Note

You may find that you can improve performance when making small requests to Azure storage queues, tables, and blobs by changing the service point settings. See “Nagle’s Algorithm is Not Friendly towards Small Requests” for more information.

Implementing Paging with Azure Table Storage

During performance testing, the response times for Default.aspx degraded as the test script added more and more expense submissions for a user. This happened because the current version of the Default.aspx page does not include any paging mechanism, so it always displays all the expense submissions for a user. As a temporary measure, Adatum modified the LINQ query that retrieves expense submissions by user to include a Take(10) clause, so that the application only requests the first 10 expense submissions. In a future phase of the project, Adatum will add paging functionality to the Default.aspx page.

Adatum has not implemented any paging functionality in the current phase of the project, but this section gives an outline of the approach it intends to take. The ResultSegment class in the Azure StorageClient library provides an opaque ContinuationToken property that you can use to access the next set of results from a query if that query did not return the full set of results; for example, if the query used the Take operator to return a small number of results to display on a page. This ContinuationToken property will form the basis of any paging implementation.

Markus Says:
	By implementing paging, we can improve the performance of the application by returning just the data the user needs to see.

The ResultSegment class only returns a ContinuationToken object to access the next page of results, and not the previous page, so if your application requires the ability to page backward, you must store ContinuationToken objects that point to previous pages. A stack is a suitable data structure to use. Figure 7 shows the state of a stack after a user has browsed to the first page and then paged forward as far as the third page.

Figure 7 - Displaying page 3 of the data from a table

Figure 7

Displaying page 3 of the data from a table

If a user clicks the Next hyperlink to browse to page 4, the page peeks at the stack to get the continuation token for page 4. After the page executes the query with the continuation token from the stack, it pushes a new continuation token for page 5 onto the stack.

If a user clicks the Previous hyperlink to browse to page 2, the page will pop two entries from the stack, and then peek at the stack to get the continuation token for page 2. After the page executes the query with the continuation token from the stack, it will push a new continuation token for page 3 onto the stack.

The following code examples show how Adatum could implement this behavior in an asynchronous ASP.NET page.

Note

Using an asynchronous page frees up the pages thread from the thread pool while a potentially long-running I/O operation takes place. This improves throughput on the web server and increases the scalability of the application.

The following two code examples show how to create an asynchronous ASP.NET page. First, add an Async="true" attribute to the page directive in the .aspx file.

<%@ Page Language="C#" AutoEventWireup="true"
         CodeBehind="Default.aspx.cs"
         Inherits="ContinuationSpike._Default"
         Async="true"%>

Second, register begin and end methods for the asynchronous operation in the load event for the page.

protected void Page_Load(object sender, EventArgs e)
{
  AddOnPreRenderCompleteAsync(
      new BeginEventHandler(BeginAsyncOperation),
      new EndEventHandler(EndAsyncOperation)
  );
}

The following code example shows the definition of the ContinuationStack class that the application uses to store continuation tokens in the session state.

Markus Says:
	We need to store the stack containing the continuation tokens as a part of the session state.

public class ContinuationStack
{
  private readonly Stack stack;

  public ContinuationStack()
  {
    this.stack = new Stack();
  }

  public bool CanMoveBack()
  {
    if (this.stack.Count >= 2) return true;
    return false;
  }

  public bool CanMoveForward()
  {
    return this.GetForwardToken() != null;
  }

  public ResultContinuation GetBackToken()
  {   
    if (this.stack.Count == 0) return null;
    // We need to pop twice and then return the next token.
    this.stack.Pop();
    this.stack.Pop();
    if (this.stack.Count == 0) return null;
    return this.stack.Peek() as ResultContinuation;
  }

  public ResultContinuation GetForwardToken()
  {
    if (this.stack.Count == 0) return null;
    return this.stack.Peek() as ResultContinuation;
  }

  public void AddToken(ResultContinuation result)
  {
    this.stack.Push(result);
  }
}

The following code example shows the BeginAsyncOperation method that starts the query execution for the next page of data. The ct value in the query string specifies the direction to move.

private IAsyncResult BeginAsyncOperation(object sender, EventArgs e, AsyncCallback cb, object extradata)
{
  var query =
    new MessageContext(CloudConfiguration.GetStorageAccount())
      .Messages.Take(3).AsTableServiceQuery();
  if (Request["ct"] == "forward")
  {
    var segment = this.ContinuationStack.GetForwardToken();
    return query.BeginExecuteSegmented(segment, cb, query);
  }

  if (Request["ct"] == "back")
  {
    var segment = this.ContinuationStack.GetBackToken();
    return query.BeginExecuteSegmented(segment, cb, query);
  }
  return query.BeginExecuteSegmented(cb, query);
}

The EndAsyncOperation method puts the query results into the messages list and pushes the new continuation token onto the stack.

private List<MessageEntity> messages;

private void EndAsyncOperation(IAsyncResult result)
{
  var cloudTableQuery =
        result.AsyncState as CloudTableQuery<MessageEntity>;
  ResultSegment<MessageEntity> resultSegment = 
        cloudTableQuery.EndExecuteSegmented(result);
  this.ContinuationStack.AddToken(
        resultSegment.ContinuationToken);
  this.messages = resultSegment.Results.ToList();
}

Preventing Users from Uploading Large Images

To prevent users from uploading large images of receipt scans to aExpense, Adatum configured the application to allow a maximum upload size of 1,024 kilobytes (KB) to the AddExpense.aspx page. The following code example shows the setting in the Web.config file.

<location path="AddExpense.aspx">
  <system.web>
    <authorization>
      <allow roles="Employee" />
      <deny users="*"/>
    </authorization>

    <!—
      Maximum request allowed to send a big image as a receipt. 
      -->
    <httpRuntime maxRequestLength="1024"/>
  </system.web>
</location>

Validating User Input

The cloud-based version of aExpense does not perform comprehensive checks on user input for invalid or dangerous items. The AddExpense.aspx file includes some basic validation that checks the length of user input, but Adatum should add additional validation checks to the OnAddNewExpenseItemClick method in the AddExpense.aspx.cs file.

System.Net Configuration Changes

The following code example shows two configuration changes that Adatum made to the aExpense application to improve its performance.

<system.net>
  <settings>
    <servicePointManager expect100Continue="false" />
  </settings>
  <connectionManagement>
    <add address = "*" maxconnection = "24" />
  </connectionManagement>
</system.net>

The first change switches off the “Expect 100-continue” feature. If this feature is enabled, when the application sends a PUT or POST request, it can delay sending the payload by sending an “Expect 100-continue” header. When the server receives this message, it uses the available information in the header to check whether it could make the call, and if it can, it sends back a status code 100 to the client. The client then sends the remainder of the payload. This means that the client can check for many common errors without sending the payload.

If you have tested the client well enough to ensure that it is not sending any bad requests, you can turn off the “Expect 100-continue” feature and reduce the number of round trips to the server. This is especially useful when the client sends many messages with small payloads; for example, when the client is using the table or queue service.

The second configuration change increases the maximum number of connections that the web server will maintain from its default value of two. If this value is set too low, the problem manifests itself through “Underlying connection was closed” messages.

Note

The exact number to use for this setting depends on your application. The page “Contention, poor performance, and deadlocks when you make Web service requests from ASP.NET applications” has useful information about how to set this for server side applications. You can also set it for a particular URI by specifying the URI in place of “*”.

WCF Data Service Optimizations

Because of a known performance issue with WCF Data Services, Adatum defined a ResolveType delegate on the ExpenseDataContext class in the aExpense application. Without this delegate, query performance degrades as the number of entities that the query returns increases. The following code example shows the delegate definition.

Markus Says:
	We made a number of changes to our WCF Data Services code to improve performance.

private static Type ResolveEntityType(string name)
{
  var tableName = name.Split(new[] { '.' }).Last();
  switch (tableName)
  {
    case ExpenseTable:
      return typeof(ExpenseRow);
    case ExpenseItemTable:
      return typeof(ExpenseItemRow);
    case ExpenseExportTable:
      return typeof(ExpenseExportRow);
  }

  throw new ArgumentException(
      string.Format(
          CultureInfo.InvariantCulture,
          "Could not resolve the table name '{0}'
          to a known entity type.", name));
}

Note

Instead of using the ResolveType delegate, you can avoid the performance problem by ensuring that your entity class names exactly match the table names.

Adatum added a further optimization to the WCF Data Services client code by setting the MergeOption to NoTracking for the queries in the ExpenseRepository class. If you are not making any changes to the entities that WCF Data Services retrieve, there is no need for the DataContext object to initialize change tracking for entities.

More Information

“Blobs, Queues, and Tables” discusses the use of Azure blobs, tables, and queues.

“Data Management” explores the options for storing data in Azure SQL Database and blob storage.

The Azure Managed Library includes detailed reference information for the Microsoft.WindowsAzure.StorageClient namespace.

“Azure Storage Services REST API Reference” explains how you can interact with Azure storage using scripts and code.

Next Topic | Previous Topic | Home | Community

7 – Moving to Microsoft Azure Table Storage

The Premise

Goals and Requirements

Overview of the Solution

Why Use Azure Table Storage?

Profile Data

The Data Export Process

Initiating the Export Process

Generating the Export Data

Exporting the Report Data

Inside the Implementation

Storing Business Expense Data in Azure Table Storage

How Many Tables?

Partition Keys and Row Keys

Defining the Schemas

Retrieving Records from a Multi-Entity Schema Table

Materializing Entities

Query Performance

Working with Development Storage

Storing Profile Data

Generating and Exporting the Expense Data

Generating the Expense Report Table

Exporting the Expenses Data

Performance Testing, Tuning, To-Do Items

Initializing the Storage Tables, Blobs, and Queues

Implementing Paging with Azure Table Storage

Preventing Users from Uploading Large Images

Validating User Input

System.Net Configuration Changes

WCF Data Service Optimizations

More Information

Additional resources