3 – Choosing a Multi-Tenant Data Architecture

Article
03/07/2014

On this page:	Download:
Storing Data in Windows Azure Applications \| Windows Azure Table Storage \| Windows Azure Blob Storage \| Windows Azure SQL Database \| Other Storage Options \| Storage Availability \| Multi-Tenant Data Architectures \| Partitioning to Isolate Tenant Data - Shared Access Signatures \| Data Architecture Extensibility \| Data Architecture Scalability \| An Example - Option 1 — Using a Single Table, Option 2 — Table per Tenant, Option 3 — Table per Base Entity Type, Option 4 — Table per Entity Type, Option 5 — Table per Entity Type per Tenant, Comparing the Options \| Goals and Requirements \| Isolation of Tenants' Data \| Application Scalability \| Extensibility \| Paging through Survey Results \| Exporting Survey Data to SQL Database for Analysis \| Overview of the Solution \| Storage Accounts \| The Surveys Data Model - Storing Survey Definitions, Storing Tenant Data, Storing Survey Answers, Storing Survey Answer Summaries \| Comparing Paging Solutions - Paging with Table Storage, Paging with Blob Storage, Comparing the Solutions \| The SQL Database Design \| Inside the Implementation \| The Data Store Classes - SurveyStore Class, SurveyAnswerStore Class, SurveyAnswersSummaryStore Class, SurveySqlStore Class, SurveyTransferStore Class, TenantStore Class \| Accessing Custom Data Associated with a Survey - Defining a Tenant’s Custom Fields, Writing Custom Fields to the Surveys Table, Reading Custom Fields from the Surveys Table \| Implementing Paging - Maintaining the Ordered List of Survey Responses, Implementing the Paging \| Implementing the Data Export \| Displaying Questions \| Displaying the Summary Statistics \| More Information

On this page:

Download:

Storing Data in Windows Azure Applications | Windows Azure Table Storage | Windows Azure Blob Storage | Windows Azure SQL Database | Other Storage Options | Storage Availability | Multi-Tenant Data Architectures | Partitioning to Isolate Tenant Data - Shared Access Signatures | Data Architecture Extensibility | Data Architecture Scalability | An Example - Option 1 — Using a Single Table, Option 2 — Table per Tenant, Option 3 — Table per Base Entity Type, Option 4 — Table per Entity Type, Option 5 — Table per Entity Type per Tenant, Comparing the Options | Goals and Requirements | Isolation of Tenants' Data | Application Scalability | Extensibility | Paging through Survey Results | Exporting Survey Data to SQL Database for Analysis | Overview of the Solution | Storage Accounts | The Surveys Data Model - Storing Survey Definitions, Storing Tenant Data, Storing Survey Answers, Storing Survey Answer Summaries | Comparing Paging Solutions - Paging with Table Storage, Paging with Blob Storage, Comparing the Solutions | The SQL Database Design | Inside the Implementation | The Data Store Classes - SurveyStore Class, SurveyAnswerStore Class, SurveyAnswersSummaryStore Class, SurveySqlStore Class, SurveyTransferStore Class, TenantStore Class | Accessing Custom Data Associated with a Survey - Defining a Tenant’s Custom Fields, Writing Custom Fields to the Surveys Table, Reading Custom Fields from the Surveys Table | Implementing Paging - Maintaining the Ordered List of Survey Responses, Implementing the Paging | Implementing the Data Export | Displaying Questions | Displaying the Summary Statistics | More Information

This chapter discusses important factors you must consider when designing the data architecture for multi-tenant applications, and explores how the Tailspin Surveys application uses data. It describes the data model used by the Surveys application, and then discusses why the team at Tailspin chose this data model with reference to a number of specific scenarios in the application. Finally, it describes how and why the application also uses Windows Azure SQL Database.

Storing Data in Windows Azure Applications

Windows Azure offers several options for storing application data. In a multi-tenant application your data architecture typically requires you use a partitioning scheme that ensures each tenant’s data is isolated, and that the application is scalable. In addition, you may need to consider how to make your storage solution extensible in order to support per tenant customization.

There are many factors to consider when selecting the type of storage to use in your application, such as features, cost, supported programming models, performance, scalability, and reliability. This section outlines the main options that are available, and identifies the key features that relate specifically to multi-tenancy. For a more general discussion of the pros and cons of the data storage options, see the associated patterns & practices guide “Moving Applications to the Cloud.”

Windows Azure Table Storage

Windows Azure tables contain large collections of state stored as property bags. Each property bag is called an entity, and each entity in a table can contain a different set of properties. You can filter and sort the entities in a table.

Bharath Says:
	`Windows Azure table storage is often referred to as schema-less because every entity in a table could have different set of properties. However, when all the entities in a table have the same set of properties (they have the same schema) a Windows Azure table is much like a table in a traditional database.</td>`

Bharath Says:

                Windows Azure table storage is often referred to as schema-less because every entity in a table could have different set of properties. However, when all the entities in a table have the same set of properties (they have the same schema) a Windows Azure table is much like a table in a traditional database.</td>

Each table can be subdivided into partitions by using a partition key, and the scalability of a solution that uses Windows Azure table storage is primarily determined by the use of appropriate partition keys. Searching for and accessing entities that are stored in the same partition is much faster than searching for and accessing entities across multiple partitions, and a query that specifies a partition key and row key is typically the most performant. In addition, Windows Azure table storage only supports transactions across entities that reside on the same partition.

The choice of keys can also help to define your multi-tenant data architecture. In a multi-tenant application you typically need to access only the data associated with a single tenant in a query, so you should use partition keys based on tenant IDs. You can store different entity types, such as tenant header and detail records, in the same table partition. Therefore, any queries that combine data from different entities associated with the same tenant will run efficiently.

Windows Azure tables are associated with a Windows Azure storage account that is accessed using an account key, and each storage account is tied to a specific Windows Azure data center.

Jana Says:
	`Partitioning by tenant is a very natural boundary to choose in a multi-tenant application. Almost all of your queries will be scoped to a single tenant.</td>`

Windows Azure Blob Storage

Windows Azure blob storage is for storing individual items such as documents, media items, XML data, or binary data. Blobs are ideal for unstructured data, so each tenant can easily store any customized data in blob storage.

You place blobs in containers, which you can use to control the visibility of your blobs. Windows Azure blobs and blob containers are associated with a Windows Azure storage account that is accessed using an account key.

Note

A single storage account can contain tables, blobs, and queues. You can have multiple storage accounts in a Windows Azure subscription.

Windows Azure SQL Database

Windows Azure SQL Database is a scalable, relational database management system for Windows Azure. It is based on SQL Server and is very similar in functionality. It is made available using the Platform as a Service (PaaS) model, so you are billed based on your usage.

SQL Database also supports federation to enable greater scalability. Federation makes use of a technique called sharding that splits tables horizontally by row across multiple databases. This allows you to take advantage of the database resources in the cloud on demand, remove the risk of having a single point of failure, and minimize I/O bottlenecks and database throttling. For a detailed discussion of SQL Database Federation see the article “Scaling Out with SQL Azure Federation” in MSDN Magazine.

Other Storage Options

Other storage options for a Windows Azure application include running a relational database such as SQL Server or MySQL in a Windows Azure Virtual Machine (VM), or running a no-SQL database such as MongoDB in a Windows Azure VM.

Storage Availability

One additional aspect you should consider when choosing a storage method is availability. Storage availability is mainly governed by two factors: whether the storage mechanism responds without fail to every request, and whether the behavior of the network connection between the application and storage constrains or even prevents access. Application performance and user experience will suffer if there is a problem when accessing storage, even if it is only a short delay while the attempt is retried, although this can be minimized in some circumstances by the judicious use of caching.

Each type of data store has a guaranteed availability and a specific maximum throughput. For example, the Service Level Agreement (SLA) for Windows Azure storage indicates that the expected availability is 99.9% (anything less than this will result in a partial refund of hosting cost). Windows Azure blob storage has a throughput limit of 60 MB per second for a single blob, while Windows Azure table storage has a throughput limit of 500 entities per second for a single partition. Careful storage design and the use of appropriate partitioning strategies can help to avoid hitting these limits.

Windows Azure SQL Database availability is also guaranteed to be 99.9%, though actual response time may be affected by throttling that is applied automatically when the underlying system detects overloading of the database server or a specific database. Throttling is applied initially by increasing response times for the affected database. However, if the situation continues Windows Azure will begin to refuse connections until the load decreases. Good query design, promptly closing connections, and the appropriate use of caching can minimize the chances of encountering database throttling.

Microsoft SQL Server 2012 supports new availability features through AlwaysOn Availability Groups. You can configure multiple SQL Server 2012 instances in Windows Azure Virtual Machines as an availability group to provide instant failover, plus the capability to read from replicas as well as from the primary instance. You can also use both a synchronous and an asynchronous commit approach to maximize performance and availability. The SLA for hosted service roles guarantees an uptime of 99.9% as long as two or more roles are deployed, and 99.9% availability of connectivity to the roles.

The second main factor that affects storage availability is the performance of the network between the application and the data source. Application and data store should, wherever possible, be located in the same datacenter to minimize network latency. The use of affinity groups can also help by causing the resources to be located in the same sector of the datacenter.

Where the application and the data it uses must be geographically separated, consider using replicas of the primary data store at the same location as the application. The geo-replication feature of Windows Azure storage can create multiple copies of the data in several datacenters, but you cannot specify which datacenters are used. However, you can create storage accounts in the appropriate datacenters and copy the data between them, and use Windows Azure Caching or the Content Delivery Network (CDN) to cache data at locations closer to the application.

When using SQL Database or SQL Server consider placing database replicas in locations close to the application, and using SQL Database Sync to synchronize the data.

Note

You can download the SLAs for Window Azure services from “Service Level Agreements.” For more information about maximizing performance and availability in multi-tenant applications see Chapter 5, “Maximizing Availability, Scalability, and Elasticity."

Multi-Tenant Data Architectures

Your data architecture must ensure that a subscriber’s data is kept private from other subscribers, and that your solution is scalable. Your application may also need to support customized data storage.

Note

For more information about multi-tenant data architectures, see “Multi-Tenant Data Architecture” and “Architecture Strategies for Catching the Long Tail.”

Partitioning to Isolate Tenant Data

The perceived risk of either accidental or malicious data disclosure is greater in a multi-tenant model. It will be harder to convince subscribers that their private data is safe if they know they are physically sharing the application with other subscribers. However, a robust design that logically isolates each tenant’s data can provide a suitable level of protection. This type of design might use database schemas where each tenant’s tables are in a separate schema, database security features that enable you to use access control mechanisms within the database, a partitioning scheme to separate tenants’ data, or a combination of these approaches.

Note

For a more detailed exploration of data security issues in multi-tenant applications, see Chapter 6, “Securing Multi-Tenant Applications,” of this guide.

In all multi-tenant applications the design must ensure that tenants can access only their own data. To achieve proper isolation you must be sure not to reveal any storage account keys, and be sure that all queries in your code access and return the correct tenant’s data.

For all the data storage mechanisms described in this section, if the tenant provides the subscription this makes clear that the tenant owns, and is responsible for, the data stored in any storage account or database in the subscription.

The following table shows the partitioning schemes you could use based on Windows Azure subscriptions. These partitioning schemes can be used with Windows Azure storage, SQL Database, and hosting a database in a VM.

Partitioning scheme	Applies to	Notes
One subscription per tenant	Table storage Blob storage SQL Database Database hosted in VM	Makes it easy to bill individual tenants for the storage resources they consume. Enables tenants to provide their own storage and then manage it. During the provisioning process, the tenant would need to provide access details such as storage account keys or database passwords to the provider. You need to be careful about the location of the storage account in relation to the location of the cloud services roles to control data transfer costs and minimize latency. Provisioning a new Windows Azure subscription is a manual process.
Group multiple tenants in a subscription	Table storage Blob storage SQL Database Database hosted in VM	If your tenants can subscribe to different levels of functionality for the application (such as light, standard, and premium), using a different subscription for each level but grouping all the tenants for each level in the same subscription makes it easier to track the costs of providing each level of functionality. You must still partition the data that belongs to different tenants within a subscription using one of the other partitioning schemes.

One subscription per tenant

Table storage

Blob storage

SQL Database

Database hosted in VM

Makes it easy to bill individual tenants for the storage resources they consume.

Enables tenants to provide their own storage and then manage it.

During the provisioning process, the tenant would need to provide access details such as storage account keys or database passwords to the provider.

You need to be careful about the location of the storage account in relation to the location of the cloud services roles to control data transfer costs and minimize latency.

Provisioning a new Windows Azure subscription is a manual process.

Group multiple tenants in a subscription

Table storage

Blob storage

SQL Database

Database hosted in VM

If your tenants can subscribe to different levels of functionality for the application (such as light, standard, and premium), using a different subscription for each level but grouping all the tenants for each level in the same subscription makes it easier to track the costs of providing each level of functionality.

You must still partition the data that belongs to different tenants within a subscription using one of the other partitioning schemes.

The following table shows the partitioning schemes you could use with Windows Azure storage, in addition to partitioning by Windows Azure subscription.

Partitioning scheme	Applies to	Notes
One storage account per tenant	Table storage Blob storage	Five storage accounts per subscription is a soft limit. You can request to extend this up to the hard limit of 20 storage accounts per subscription; however, this may limit the usefulness of this partitioning approach. Each storage account appears as a line item on your Windows Azure bill, so this approach can be useful if you want to identify the precise costs per tenant.
Group multiple tenants in a storage account	Table storage Blob storage	Enables you to group tenants by geographic region, by regulatory requirements, and by replication requirements. You must still partition the data that belongs to different tenants within a storage account using one of the other partitioning schemes.
One table per tenant	Table storage	There is no practical limit to the number of tables you can have in a Windows Azure storage account. You can automate creating a table within the provisioning process. Include the tenant’s ID in the table name.
Single table with one partition key per tenant	Table storage	There is no practical limit on the number of partitions in a table. Include the tenant’s ID in the partition key.
One container per tenant	Blob storage	There is no practical limit on the number of containers you can have in a Windows Azure storage account. Enables you to store all the blobs associated with a single tenant in a single container, much like using a folder on the file system. This makes it easy to manage tenant specific data. For example, provisioning and de-provisioning tenants, backup, archiving, and setting access policies. You can create containers automatically during the provisioning process. Include the tenant’s ID in the container name.
Blob naming convention	Blob storage	There is no practical limit to the number of blobs you can have in a container. Include the tenant’s ID in the blob name whenever you create a new blob.

Windows Azure storage is billed by the amount of storage used and by the number of storage transactions, so from a cost perspective it doesn’t matter how many separate storage accounts or containers you have.

Jana Says:
	`Using blob containers instead of a blob naming convention is the simplest solution to identify who the blob belongs to. </td>`

The following table shows the partitioning schemes you could use with Windows Azure SQL Database, or with a database such as SQL Server or MySQL hosted in a Windows Azure VM. These are in addition to partitioning by Windows Azure subscription (as described in the previous tables).

Partitioning scheme	Applies to	Notes
One server per tenant	SQL Database	Each SQL Database server can be hosted in a different geographic region. There is a limit on the number of SQL Database servers for a subscription.
One VM per tenant	Database hosted in VM	For each tenant you will incur the costs associated with running a Windows Azure VM. You can give the tenant access to the VM. You are limited to 25 VMs in a Windows Azure IaaS deployment.
One database per tenant	SQL Database Database hosted in VM	Each database you create has a specified size that determines the monthly cost. You can create each database in a different logical server, or host several databases in a single server. You have more databases to manage. There is a limit on the number of databases you can install on a SQL Database server.
Multiple tenants per database with per tenant tables	SQL Database Database hosted in VM	Enabling multiple tenants to share a database helps to reduce per tenant costs by efficiently using the storage you are paying for. You isolate tenant data by using separate tables for each tenant. You can use a naming convention that includes the tenant ID in the table name, or use a different database schema for each tenant.
Multiple tenants per database with shared tables	SQL Database Database hosted in VM	You must have a partitioning scheme to identify each tenant’s records in each table, such as using the tenant’s ID as part of the key.

For more information about managing multi-tenant data in relational databases such as SQL Server or SQL Database, see the article “Multi-Tenant Data Architecture” and the blog post “Full Scale 180 Tackles Database Scaling with Windows Azure.”

Jana Says:
	SQL Federation, described later in this chapter as a technique for scaling your SQL Database instance both in size and performance, uses the “multiple tenants per database with shared tables” approach to enable it to scale out your database. To support a multi-tenant application, SQL Federation typically uses the tenant ID to determine which database instance in the federation should store a particular tenant’s record. SQL Federation also supports filtered connections; you can use these to isolate tenant data and ensure that only individual tenant’s data can be accessed over a connection.</td>

Jana Says:

                SQL Federation, described later in this chapter as a technique for scaling your SQL Database instance both in size and performance, uses the “multiple tenants per database with shared tables” approach to enable it to scale out your database. To support a multi-tenant application, SQL Federation typically uses the tenant ID to determine which database instance in the federation should store a particular tenant’s record. SQL Federation also supports filtered connections; you can use these to isolate tenant data and ensure that only individual tenant’s data can be accessed over a connection.</td>

Windows Azure SQL Database cost is based on the number and size of databases you have, so it makes sense from a cost perspective to have as many tenants as possible sharing each instance.

At the time of writing you are limited to six Windows Azure SQL Database servers per Windows Azure subscription and 150 databases per server, although these limits may be extended on request and may change in the future.

Poe Says:
	`You are billed for Windows Azure SQL Database based on the number of databases you have, and the size of the databases. If you transfer data in and out of Windows Azure SQL Database from within the same data center there’s no data transfer cost, but if you transfer out of the data center you’ll be charged for the data transfer.</td>`

Poe Says:

                You are billed for Windows Azure SQL Database based on the number of databases you have, and the size of the databases. If you transfer data in and out of Windows Azure SQL Database from within the same data center there’s no data transfer cost, but if you transfer out of the data center you’ll be charged for the data transfer.</td>

Shared Access Signatures

Both Windows Azure table storage and Windows Azure blob storage support shared access signatures as a mechanism for controlling access to data. You can use shared access signatures to ensure isolation of tenant data.

Typically, all table data in a storage account is available for read and write access to any client that has access to the storage account key. With blob storage, a client that has access to the storage account key also has read and write access to all blobs in all containers in the storage account. In addition, you can grant public read access to a blob so that anyone who knows the blob’s URL will be able to read its content.

A shared access signature is a way to grant temporary access to a resource using a token. Your application can generate a shared access signature token for a blob container, an individual blob, or for a range of entities in a table. A shared access signature grants the holder of the token specific access rights such as read, write, update, and delete for a fixed time. You could use a role instance to generate shared access signatures for a specific tenant’s data and then issue those tokens to another role instance, possibly in another Windows Azure subscription. In this way only specific roles need to have access to the storage account keys that grant full access to the data in the storage account.

For more information about shared access signatures see Chapter 6, “Securing Multi-tenant Applications,” in this guide, and the blog post “Introducing Table SAS (Shared Access Signature), Queue SAS and update to Blob SAS.”

Data Architecture Extensibility

There are a number of ways you can design your data storage to enable tenants to extend the data model so that it includes their own custom data. These approaches range from each tenant having a separate schema, to providing a set of pre-defined custom columns, to more flexible schemas that enable a tenant to add an arbitrary number of custom fields to a table.

If you use Windows Azure SQL Database, much of the application’s complexity will result from having to work within the constraints of fixed data schemas. If you are using Windows Azure table storage, the complexity will arise from working with variable schemas. Windows Azure table storage allows records in the same table to have completely different structures, which allows for a great deal of flexibility at the cost of more complexity in the code.

Note

Microsoft SharePoint is an example of an application with a fixed schema database that looks extremely flexible.

Custom extensions to the application’s data model should not require changes to the application code. To enable the application’s business logic and presentation logic to integrate with the data model extensions, you will require either a set of configuration files that describe the extensions, or write code that can dynamically discover the extensions. However, if you enable tenants to extend the application through some predefined extension points or through an API, an extension could include both changes to the data model and to the code.

Jana Says:
	`You should aim to have a single codebase for your application, and avoid the situation where custom data extensions require different codebases.</td>`

The following table summarizes the options for implementing an extensible data architecture in Windows Azure table storage:

Extensibility approach	Notes
Separate table per tenant	Each table can use custom schemas for that particular tenant.
Single table with multiple schemas	Each tenant can use custom schemas for the entities it stores in the table.
Single schema with separate tables holding custom data	In Windows Azure table storage, transactions are only supported within a partition on a table. With this approach it is not possible to save all the data associated with an entity in a single transaction.

Figure 1 illustrates these alternatives using two of Tailspin’s subscribers, Adatum and Fabrikam, as examples. Each subscriber is storing different data as part of a survey definition.

Figure 1 - Examples showing per tenant customizations

Figure 1

Examples showing per tenant customizations

Markus Says:
	`A slug name stored in the Survey table is a string where all whitespace and invalid characters are replaced with a hyphen (-). The term comes from the newsprint industry and has nothing to do with those things in your garden!</td>`

Markus Says:

                A slug name stored in the Survey table is a string where all whitespace and invalid characters are replaced with a hyphen (-). The term comes from the newsprint industry and has nothing to do with those things in your garden!</td>

Using different schemas for different tenants—either in the same table or in different tables—enables a great deal of flexibility for extending and customizing the data architecture. In Windows Azure table storage you don’t need to predefine the schemas you will use before adding entities to a table. However, managing multiple schemas will add to the complexity of your solution. By limiting the customizability of your application you can limit the complexity of your solution.

Jana Says:
	`You can use multiple schemas in a Windows Azure table to enable you to store different types of entity or to support per tenant customizations in a single table. However, this approach can make your code more complex. Using one schema per table, and having multiple tables, can simplify your code but it does mean you have more tables to manage.</td>`

Jana Says:

                You can use multiple schemas in a Windows Azure table to enable you to store different types of entity or to support per tenant customizations in a single table. However, this approach can make your code more complex. Using one schema per table, and having multiple tables, can simplify your code but it does mean you have more tables to manage.</td>

The following table summarizes the options for implementing an extensible data architecture in SQL Database, and in relational databases such as SQL Server and MySQL that can run in a Windows Azure VM.

Extensibility approach	Notes
Separate database with custom schema per tenant	Each database can use a different schema to accommodate the requirements of each tenant. Typically, the custom schema must be defined during the provisioning process.
Shared database with separate schema or tables for each tenant	For relational databases that support multiple schemas within a database, each tenant can use a custom schema. Otherwise each tenant can have its own set of tables, identified using a naming convention. Typically, the custom schema must be defined during the provisioning process.
Single fixed schema with a set of columns available for custom data	Limits the amount of customization that is possible because there are a limited number of custom columns available.
Single fixed schema with separate tables holding custom data	Allows slightly more flexibility than using custom columns.

Using custom schemas will add to the complexity of the solution, especially because the schema must be defined before you can use the database. It is difficult to change a schema after you have added data to a table.

Jana Says:
	SQL Federation, described later in this chapter as a technique for scaling your SQL Database instance, assumes that tenants share a database table. It uses the tenant ID to determine which database instance in the federation should store a particular tenant’s record. Any solution that uses separate tables for each tenant will need to use a different scheme for partitioning the data across federated database instances. It should also keep related records on the same instance in order to minimize query response times.</td>

Jana Says:

                SQL Federation, described later in this chapter as a technique for scaling your SQL Database instance, assumes that tenants share a database table. It uses the tenant ID to determine which database instance in the federation should store a particular tenant’s record. Any solution that uses separate tables for each tenant will need to use a different scheme for partitioning the data across federated database instances. It should also keep related records on the same instance in order to minimize query response times.</td>

Data Architecture Scalability

If you can partition your data horizontally you will be able to scale out your data storage. In the case of Windows Azure SQL Database, if you decide that you need to scale out you should be able to move all of an individual tenant’s data to a new database instance. The partitioning scheme you choose will also affect the scalability of your solution.

For Windows Azure table storage, the most significant decision that affects scalability is the choice of partition key for a table. Queries that operate on a single partition are much more efficient than queries that access entities that exist on multiple partitions. In addition, you can only use transactions when all the entities involved reside on the same partition in the same table. Typically, a partition key that includes the tenant ID will help to make your Windows Azure table storage design scalable because the majority of your queries will need to access only a single partition to retrieve their data. For more information, see Chapter 7, “Moving to Windows Azure Table Storage” of the related patterns & practices guide “Moving Applications to the Cloud.”

For SQL Database, federation helps you to scale out across multiple databases by partitioning your data horizontally. If you decide to partition your tenant data by including the tenant ID in the primary key, this can be combined with SQL Database federation to achieve scalability.

Note

Partitioning data horizontally, also known as sharding, implies taking some of the records in a table and moving them to a new table. Partitioning data vertically implies taking some fields from every row and placing them in a different table. For a discussion of federation and sharding in Windows Azure SQL Database, see “Federations in Windows Azure SQL Database.”

An Example

This section shows a set of alternative data architectures in order to illustrate some of the key issues you should consider, such as isolation, extensibility, and scalability. This simple example makes the following assumptions about the application and the data:

A multi-tenant application stores the data.
You are storing the data in Windows Azure table storage.
There are two basic record types: a header record and a detail record where there is a one-to-many relationship between them.
All queries in the application access records for a specific month in a specific year.
Tenants with a premium subscription use an extended version of the detail record. Tenant B is an example of a tenant with a premium subscription; tenant A has a standard subscription.

The following is a list of five alternatives, and it describes the entity types stored in each table in each case. This is not an exhaustive list of the possible options, but it does illustrate a range of possibilities that you might consider. All of the options are designed to ensure that the application can keep each tenant’s data isolated. You can find a discussion of some of the advantages and limitations of the different approaches at the end of this section.

Option 1 — Using a Single Table

Application Data

Partition Key	Row Key	Entity Type
Tenant ID, Month, Year	Header Entity ID	Header record
Tenant ID, Month, Year	Detail Entity ID	Detail record (standard schema)
Tenant ID, Month, Year	Detail Entity ID	Detail record (extended schema)

Option 2 — Table per Tenant

Tenant A (uses standard detail record schema)

Partition Key	Row Key	Entity Type
Month, Year	Header Entity ID	Header record
Month, Year	Header Entity ID, Detail Entity ID	Detail record (standard schema)

Tenant B (uses extended detail record schema)

Partition Key	Row Key	Entity Type
Month, Year	Entity ID	Header record
Month, Year	Header Entity ID, Detail Entity ID	Detail record (extended schema)

Option 3 — Table per Base Entity Type

Header Records

Partition Key	Row Key	Entity Type
Tenant ID, Month, Year	Header Entity ID	Header record

Detail Records

Partition Key	Row Key	Entity Type
Tenant ID, Month, Year	Header Entity ID, Detail Entity ID	Detail record (standard schema, standard tenants)
Tenant ID, Month, Year	Header Entity ID, Detail Entity ID	Detail record (extended schema, premium tenants)

Option 4 — Table per Entity Type

Header Records

Partition Key	Row Key	Entity Type
Tenant ID, Month, Year	Header Entity ID	Header record

Detail Records (standard tenants)

Partition Key	Row Key	Entity Type
Tenant ID, Month, Year	Header Entity ID, Detail Entity ID	Detail record (standard schema, standard tenants)

Detail Records (premium tenants)

Partition Key	Row Key	Entity Type
Tenant ID, Month, Year	Header Entity ID, Detail Entity ID	Detail record (extended schema, premium tenants)

Option 5 — Table per Entity Type per Tenant

Tenant A Header Records

Partition Key	Row Key	Entity Type
Month, Year	Header Entity ID	Header record

Tenant B Header Records

Partition Key	Row Key	Entity Type
Month, Year	Header Entity ID	Header record

Tenant A Detail Records (standard schema)

Partition Key	Row Key	Entity Type
Month, Year	Header Entity ID, Detail Entity ID	Detail record (standard schema, standard tenants)

Tenant B Detail Records (extended schema)

Partition Key	Row Key	Entity Type
Month, Year	Header Entity ID, Detail Entity ID	Detail record (extended schema, premium tenants)

Comparing the Options

There is no right or wrong choice from the options listed above; the specific requirements of your application will determine which one you choose. There are many considerations that might affect your choice, some of which include the following:

Transactional behavior. Windows Azure table storage only supports transactions within a partition. If there is a requirement to support transactions that span the header and detail records, options one and two provide this functionality.
Query Performance. Queries against Windows Azure table storage perform best when you can specify the partitions that contain the data in the query. You need to analyze the queries in your application to decide on a partition scheme that can optimize their performance.
Code complexity. Dealing with multi-schema tables is more complex than single-schema tables. However, if you plan to have additional schema extensions or allow per tenant schema customizations, it could be more complex to manage many different tables in addition to supporting multiple schemas.
Managing the data. Performing management operations such as backing up, creating and deleting tenants, and logging may be easier if each tenant has its own set of tables.
Scale out. For very large volumes of data and transactions, you may want to scale out to use multiple Windows Azure storage accounts. You should chose an architecture that makes it easy to divide your data across storage accounts, most likely by placing some tenants in one account and others in a different one. For more information about scaling multi-tenant applications, see Chapter 5, “Maximizing Availability, Scalability, and Elasticity.”
Geo location. To reduce latency and improve performance you may want to store the data belonging to a particular tenant in a particular datacenter. Again, your architecture should support this type of partitioning.

The options shown above illustrate alternative approaches to storing multi-tenant data and do not specifically address the issue of scalability. There is an anti-pattern for Windows Azure table storage where you only append or prepend entities to a specific partition: all writes then go to the same partition, limiting the scalability of the solution. A common way to implement this anti-pattern is to use the current date as the table partition key, so in the options shown above you should verify whether the anticipated volume of transactions means that a choice of month and year for the partition key is sub optimal. For more information, see the presentation Windows Azure Storage Deep Dive on Channel 9.

Goals and Requirements

This section describes the specific goals and requirements that Tailspin has for the Surveys application with respect to the architecture and design of the data storage elements of the application.

Isolation of Tenants' Data

Tailspin wants to ensure that each tenant’s data is fully isolated from every other tenant’s data. When a tenant publishes a survey it becomes publically available, but each tenant must be able to manage its survey definitions privately. For example, a tenant can control when its surveys become publically visible. In addition, the survey response data must be confidential to the tenant who created the survey.

Application Scalability

As Tailspin signs up more tenants for the Surveys application it must be able to scale out the storage. This is particularly important for Survey response data because some Surveys might result in a large number of responses. Furthermore, Tailspin would like to be able to scale the Surveys application out (and in) automatically because Tailspin cannot predict when a tenant will create a survey that elicits a large number of responses.

Extensibility

As a feature for premium subscribers, Tailspin plans to allow tenants to store additional, tenant-specific information as part of each survey. Tenants must be able to store one or more custom fields as part of a survey definition. Tenants can use this information to store metadata for each survey, such as the product the survey relates to or the person who owns the survey.

Tailspin plans to add support for new question types in the future. However these will be made available to all subscribers and will not be treated as a per tenant customization.

Jana Says:
	`Adding support for a new question type will affect many areas of Tailspin Surveys such as storage, the UI, exporting to SQL Azure, and data summarization. Therefore, Tailspin will develop any new question types as extensions available to all subscribers.</td>`

Jana Says:

                Adding support for a new question type will affect many areas of Tailspin Surveys such as storage, the UI, exporting to SQL Azure, and data summarization. Therefore, Tailspin will develop any new question types as extensions available to all subscribers.</td>

Paging through Survey Results

The owner of a survey must be able to browse through the survey results, displaying a single survey response at a time. This feature is in addition to being able to view summary statistical data, and being able to analyze the results using Windows Azure SQL Database. The Surveys application contains a Browse Responses page for this function.

The design of this feature of the application must address two specific requirements. The first requirement is that the application must display the survey responses in the order they were originally submitted. The second requirement is to ensure that this feature does not adversely affect the performance of the web role.

Exporting Survey Data to SQL Database for Analysis

The Surveys application uses Windows Azure storage to store survey definitions and survey responses. Tailspin chose to use Windows Azure storage because of its lower cost and because the cost depends on the amount of usage—both in terms of capacity used and the number of storage transactions per month. However, to control the costs associated with storage, the Surveys application does not offer a great deal of flexibility in the way that subscribers can analyze the survey responses. A subscriber can browse through the responses to a survey in the order that users submitted their responses, and a subscriber can view a set of fixed design summaries of the statistical data for each survey.

Windows Azure SQL Database allows subscribers to perform complex analysis on their survey results. Subscribers can also create custom reports using Windows Azure SQL Reporting. For complex datasets subscribers can use a Windows Azure Big Data solution based on the Apache Hadoop software library.

To extend the analysis capabilities of the Surveys application, Tailspin allows subscribers to dump their survey responses into a Windows Azure SQL Database instance. They can then perform whatever detailed statistical analysis they want using tools of their choosing, or they can use this as a mechanism to download their survey results to an on-premise application by exporting the data from Windows Azure SQL Database.

The application must be able to export all survey data to Windows Azure SQL Database, including the question definitions as well as the survey responses.

This feature is included in the monthly fee for a Premium subscription. Subscribers at other levels can purchase this feature as an add-on to their existing subscription. Subscribers who choose to use this feature have their own private instance of Windows Azure SQL Database to ensure that they are free to analyze and process the data in any way that they see fit. For example, they may choose to create new tables of summary data, design complex data analysis queries, or design custom reports. This data must be kept confidential for each tenant.

Overview of the Solution

This section describes some specific features of the Tailspin Surveys application related to data architecture, and discusses the reasons for the solution adopted by Tailspin. It identifies any alternatives Tailspin considered and the trade-offs they imply.

Storage Accounts

Tailspin considered using separate Windows Azure subscriptions for its premium and standard subscribers, but the additional complexity of managing separate subscriptions did not provide any significant benefits. The billing information provided by Windows Azure for a single subscription gives Tailspin enough detailed information to understand its costs.

Tailspin does plan to use separate storage accounts for the different regions where it hosts the Surveys application. This is discussed in more detail in Chapter 5, “Maximizing Availability, Scalability, and Elasticity.”

Note

A storage account is tied to a specific Windows Azure data center, so Tailspin must use multiple storage accounts to store data in different regions.

The Surveys Data Model

This section describes the data model in the Surveys application and explains how the table design partitions the data by tenant.

The Surveys application uses a mix of table storage and blob storage to store its data. The section, “Options for Saving Survey Responses” in Chapter 5, “Maximizing Availability, Scalability, and Elasticity,” and the following section in this chapter discuss why the application uses blob storage for some data. Figure 2 shows, at a high level, which data is stored in the different types of storage.

The Surveys application uses blob and table storage.

Figure 2 - Data storage in the Surveys application

Figure 2

Data storage in the Surveys application

Storing Survey Definitions

The Surveys application stores the definition of surveys in two Windows Azure tables. This section describes these tables and explains why Tailspin adopted this design.

Tailspin chose to store survey definition in Windows Azure table storage to simplify the implementation of this part of the application. Every survey has some header data that describes the survey, and an ordered list of questions. It is easy to model this using a parent/child relationship between the two types of entity stored in the tables.

Survey definitions are read whenever a user responds to a survey, and Tailspin minimizes the number of storage transactions involved by caching survey definitions in memory. Although Tailspin cannot use a transaction when it saves survey definitions because it is using two tables, it maintains data consistency by saving the question entities before the corresponding survey entity. This can lead to orphaned question entities if a failure occurs before the survey entity is saved and so Tailspin will regularly scan the Questions table for such orphaned rows.

A transaction cannot be used to ensure full consistency of data when saving it in two separate Windows Azure storage tables. By saving the data in the child table first, and then saving the linked row in the parent table only if the child rows were saved successfully, Tailspin can prevent inconsistencies in the data. However, it does mean that some child rows could be saved without a matching row in the parent table if the second save operation fails, and so Tailspin will need to create and run a separate process that periodically sweeps the child table for orphaned rows. This will add to the development and runtime cost.

The following table describes the fields in the Surveys table. This table holds a list of all of the surveys in the application.

Field name	Notes
PartitionKey	This field contains the tenant name. Tailspin chose this value because they want to be able to filter quickly by tenant name, and ensure the isolation of survey definitions by tenant.
RowKey	This field contains the tenant name from the PartitionKey field concatenated with the slug name version of the survey name. This makes sure that a subscriber cannot create two surveys with the same name. Different subscribers can use the same name for a survey; these surveys are differentiated by the tenant name part of the ID.
Timestamp	Windows Azure table storage automatically maintains the value in this field.
CreatedOn	This field indicates when the subscriber created the survey. This will differ from the Timestamp value if the subscriber edits the survey.
SlugName	The slug name version of the survey name.
Title	The survey name.

The following table describes the fields in the Questions table. The application uses this table to store the question definitions and to render a survey.

Field name	Notes
PartitionKey	This field contains the row key from the Surveys table, which is the tenant name from the PartitionKey field in the Surveys table concatenated with the slug name version of the survey name. This enables the application to insert all questions for a survey in a single transaction and to retrieve all the questions in a survey quickly from a single partition.
RowKey	This field contains a formatted tick count concatenated with the position of the question within the survey. This guarantees a unique RowKey value and defines the ordering of the questions.
Timestamp	Windows Azure table storage automatically maintains the value in this field.
PossibleAnswers	This field contains a list of the possible answers if the question is a multiple-choice question.
Text	The question text.
Type	The question type: Simple text, multiple choice, or five stars (a numeric range).

Jana Says:
	`Remember that Windows Azure table storage only supports transactions within a single partition in a single table, and that a transaction cannot modify more than 100 rows at a time.</td>`

Each of these tables uses the tenant ID in the partition key. This helps to isolate tenant data because all of the queries in the Tailspin Surveys application include a partition key value. Using a single table to store all of the tenants’ data makes it easier for Tailspin to manage this data. For example, Tailspin can easily back up all of the survey definitions.

Premium tenants can add their own custom metadata to enable linking with their own applications and services. This level of customization requires Tailspin to extend the schema of the Surveys table in Windows Azure storage and to add code to the application that recognizes this custom data. To extend the survey table schema Tailspin considered two alternative approaches:

Store the custom data in the existing survey table, using different custom fields for each tenant.
Store the custom data in a separate table.

Tailspin chose the first option. Windows Azure table storage allows you to use multiple schemas in the same table; therefore, each tenant can have its own custom fields. Furthermore, if a tenant changes the type of custom data that it needs to store, it can itself have multiple schemas. The following table illustrates how Adatum added a new custom field before it added its second survey, and how Fabrikam has different custom fields from Adatum.

Partition key	Row key	Slug name	Title	Product name	Owner	Promotion
adatum	adatum_survey-1	survey-1	Survey 1	Widgets
adatum	adatum_survey-2	survey-2	Survey 2	Gadgets	Mary
fabrikam	fabrikam_survey-1	survey-1	Survey 1			Promo 1
fabrikam	fabrikam_survey-2	survey-2	Survey 2			Promo 2

The first Adatum survey only has the Product name custom field in the surveys table; the second Adatum survey has both the Product name and Owner custom fields in the surveys table. The two Fabrikam surveys only have the Promotion custom field in the surveys table.

Jana Says:
	`The Surveys table is a multi-schema table. Each tenant can specify its own custom fields. </td>`

The Surveys application must be able to read from and write to the custom fields in the Surveys table. The developers at Tailspin considered two approaches to this. The first was to use a custom DLL for each tenant that was responsible for accessing the custom fields. The second approach was to store details of the custom fields as part of the tenant configuration data, and use this configuration data to determine how to access the custom fields at runtime.

Tailspin selected the second approach for two reasons: there is no need for Tailspin to create custom code for each tenant to support that tenant’s custom fields. It is also easier to support different versions of the customization for a tenant. In the table of custom fields shown earlier, you can see that Adatum changes the set of custom fields it uses after it creates the first survey.

To read a more detailed discussion of RowKeys and PartitionKeys in Windows Azure table storage, see Chapter 7, “Moving to Windows Azure Table Storage” of the guide “Moving Applications to the Cloud.”

Storing Tenant Data

The application collects most of the subscriber data during the on-boarding process. The Logo property of the Tenant class, shown below, contains the URL for the subscriber’s logo. The application stores logo images in a public blob container named logos.

[Serializable]
public class Tenant
{
  public string ClaimType { get; set; }
  public string ClaimValue { get; set; }
  public string HostGeoLocation { get; set; }
  public string IssuerThumbPrint { get; set; }
  public string IssuerUrl { get; set; }
  public string IssuerIdentifier { get; set; }
  public string Logo { get; set; }
  [Required(ErrorMessage = 
  "* You must provide a Name for the subscriber.")]
  public string Name { get; set; }
  public string WelcomeText { get; set; }
  public SubscriptionKind SubscriptionKind { get; set; }
  public Dictionary<string, List<UDFMetadata>>
           ExtensionDictionary { get; set; }
  public string SqlAzureConnectionString { get; set; }
  public string DatabaseName { get; set; }
  public string DatabaseUserName { get; set; }
  public string DatabasePassword { get; set; }
  public string SqlAzureFirewallIpStart { get; set; }
  public string SqlAzureFirewallIpEnd { get; set; }
}

Markus Says:
	We chose to store tenant data in Windows Azure blob storage to simplify the implementation of this part of the application. Tenant data has a very simple structure that can easily be stored in a blob, and it does not require any of the features provided by table storage. Tenant data is read very frequently and so, to minimize the number of storage transactions and to improve performance, the application caches tenant data in memory.

Markus Says:

We chose to store tenant data in Windows Azure blob storage to simplify the implementation of this part of the application. Tenant data has a very simple structure that can easily be stored in a blob, and it does not require any of the features provided by table storage. Tenant data is read very frequently and so, to minimize the number of storage transactions and to improve performance, the application caches tenant data in memory.

Storing Survey Answers

The Surveys application saves survey answers in blob storage. The application creates a blob container for each survey with a name that follows this pattern: **surveyanswers-<tenant name>-<survey slug name>****. This guarantees a unique container name for every survey and ensures that the application can easily identify the answers that belong to a specific survey or tenant.

Tailspin chose to save each complete survey to blob storage rather than as a set of answer entities in a table because it found that saving to blob storage was faster in this particular scenario. The developers at Tailspin ran timing tests to compare saving to a blob and saving the same data as a set of rows in an Entity Group Transaction to table storage and found a significant difference between the two approaches. However, when using blob storage to persist survey responses Tailspin must also consider how subscribers can browse their survey responses in the order that they were submitted. For a more detailed explanation of how Tailspin evaluated the pros and cons of saving survey response data to blob or table storage, see Chapter 5, “Maximizing Availability, Scalability, and Elasticity.”

For each completed survey response, the Surveys application saves a blob into the survey’s container. The content of each blob is a SurveyAnswer object serialized in the JavaScript Object Notation (JSON) format. A SurveyAnswer object encapsulates a respondent’s complete survey response, with all the respondent’s answers to individual questions. The following code example shows the SurveyAnswer and QuestionAnswer classes. The QuestionAnswers property of the SurveyAnswer class is a list of QuestionAnswer objects.

public class SurveyAnswer
{
  ...

  public string SlugName { get; set; }
  public string Tenant { get; set; }
  public string Title { get; set; }
  public DateTime CreatedOn { get; set; }
  public IList<QuestionAnswer>
            QuestionAnswers { get; set; }
}

public class QuestionAnswer
{
  public string QuestionText { get; set; }
  public QuestionType QuestionType { get; set; }
  [Required(ErrorMessage = "* You must provide an answer.")]
  public string Answer { get; set; }
  public string PossibleAnswers { get; set; }
}

The name of the blob used to store each survey response is a GUID, which ensures that each blob in the container has a unique name. This means that the application does not save the survey responses with blob names that provide any useful ordering. Tailspin chose this approach in preference to saving the blobs using a naming convention that reflects the order in which the system received the survey answers in order to avoid the append/prepend anti pattern described in the presentation Windows Azure Storage Deep Dive on Channel 9.

However, Tailspin does have a requirement to enable subscribers to view the survey responses in the order they were submitted. To achieve this, the Surveys application also uses blob storage to store an ordered list of the responses to each survey. For each survey, the application stores a blob that contains a serialized List object containing the ordered names of all the survey response blobs (each of which contains a serialized SurveyAnswer object) for that survey. The List object is serialized in the JSON format. The section “Implementing Paging” later in this chapter explains how the Surveys application uses these List objects to enable paging through the survey results.

However, it’s possible that a very large number of answers to a survey will affect performance because the process of updating this list will take longer as the size of the list grows. See the section “Maintaining a List of Survey Answers” in Chapter 7, “Managing and Monitoring Multi-tenant Applications,” for more details of how Tailspin plans to resolve this issue.

Storing Survey Answer Summaries

The Surveys application uses blob storage to save the summary statistical data for each survey. For each survey, it creates a blob named <tenant name>-<survey slug name>** in the surveyanswerssummaries container. The application serializes a SurveyAnswersSummary object in JSON format to save the data. A SurveyAnswersSummary object contains summary data for a survey, such as the total number of responses received that is stored in the TotalAnswers property. There is one SurveyAnswersSummary object for every survey.

The QuestionAnswersSummaries property of a SurveyAnswersSummary object contains a list of the questions in the survey. A QuestionAnswerSummary object contains the summary data for an individual question, such as an average for numeric questions. There is one QuestionAnswerSummary object for each question in a survey.

The following code example shows the SurveyAnswersSummary and QuestionAnswersSummary classes that define this summary data.

public class SurveyAnswersSummary
{
  ...

  public string Tenant { get; set; }
  public string SlugName { get; set; }
  public int TotalAnswers { get; set; }
  public IList<QuestionAnswersSummary> 
    QuestionAnswersSummaries { get; set; }

  ...
}

public class QuestionAnswersSummary
{
  public string AnswersSummary { get; set; }
  public QuestionType QuestionType { get; set; }
  public string QuestionText { get; set; }
  public string PossibleAnswers { get; set; }
}

Notice that the summary is stored as a string for all question types, including numeric. This helps to minimize the number of changes that would be required to add a new question type to the Surveys application.

Comparing Paging Solutions

The developers at Tailspin considered two solutions to enable tenants to browse through survey responses, each based on a different storage model. The first option assumed that the application stored the survey response data in table storage. The second option, which was the one chosen, assumed that the application stored the survey response data in blob storage.

Paging with Table Storage

The developers at Tailspin looked at two features of the Windows Azure table storage API to help them design this solution. The first feature is the continuation token that you can request from a query, which enables you to execute a subsequent query that starts where the previous query finished. You can use a stack data structure to maintain a list of continuation tokens that you can use to go forward one page or back one page through the survey responses. You must then keep this stack of continuation tokens in the user's session state to enable navigation for the user.

Note

For an example of this approach, see the section, “Implementing Paging with Windows Azure Table Storage” in Chapter 7, “Moving to Windows Azure Table Storage,” of the guide “Moving Applications to the Cloud.”

The second useful API feature is the ability to run asynchronous queries against Windows Azure table storage. This can help avoid thread starvation in the web server's thread pool in the web role by offloading time-consuming tasks to a background thread.

Paging with Blob Storage

The assumption behind this solution is that each survey answer is stored in a separate blob. To access the blobs in a predefined order, you must maintain a list of all the blobs. You can then use this list to determine the identity of the previous and next blobs in the sequence and enable the user to navigate backward and forward through the survey responses. To support alternative orderings of the data, you must maintain additional lists.

Comparing the Solutions

Chapter 5, “Maximizing Availability, Scalability, and Elasticity,” identifies cost savings as a reason to store survey responses directly in blob storage. In addition, paging with table storage is complex because you must manage the continuation stack in the user's session state.

However, you must consider the costs and complexity associated with maintaining the ordered list of blobs in the second of the two alternative solutions. This incurs two additional storage transactions for every new survey response; one as the list is retrieved from blob storage, and one as it is saved back to blob storage. However, this still results in fewer transactions per survey response than the table-based solution. Furthermore, it's possible to avoid using any session state by embedding the links to the next and previous blobs directly in the web page.

Bharath Says:
	What at first seems like the obvious solution (in this case, to use table storage) may not always turn out to be the best.

The SQL Database Design

During the on-boarding process, the application will provision a new Windows Azure SQL Database instance for those subscribers who have access to this feature. This enables tenants to customize the database to their own requirements, and to manage their own custom reporting requirements using Windows Azure SQL Reporting. A private instance of Windows Azure SQL Database for each tenant also helps to ensure that survey response data remains confidential.

The provisioning process will create the necessary tables in the database. As part of the on-boarding process, the Surveys application saves in blob storage (as part of the subscriber’s details) the information that the application and the subscriber require to access the Windows Azure SQL Database instance.

Poe Says:
	Giving every subscriber a separate instance of Windows Azure SQL Database allows the subscribers to customize the database to their own requirements. It also simplifies the security model, making it easier for Tailspin to ensure that survey response data is kept isolated.

A subscriber can use the UI to request the application export the survey data to a SQL Database instance. The UI notifies the worker role by placing a message on a queue. A task in a worker role monitors the queue for messages that instruct it to dump a subscriber’s survey results to tables in Windows Azure SQL Database. Figure 3 shows the table structure in Windows Azure SQL Database.

Figure 3 - Surveys table structure in Windows Azure SQL Database

Figure 3

Surveys table structure in Windows Azure SQL Database

Inside the Implementation

Now is a good time to walk through some of the code in the Tailspin Surveys application in more detail. As you go through this section, you may want to download the Visual Studio solution for the Tailspin Surveys application from https://wag.codeplex.com/.

The Data Store Classes

The Surveys application uses store classes to manage storage. This section briefly outlines the responsibilities of each of these store classes.

SurveyStore Class

This class is responsible for saving survey definitions to table storage and retrieving the definitions from table storage.

Markus Says:
	This class uses caching to reduce latency when retrieving survey definitions for the public web site.

SurveyAnswerStore Class

This class is responsible for saving survey answers to blob storage and retrieving survey answers from blob storage. This class creates a new blob container when it saves the first response to a new survey, so there is one container per survey. It uses a queue to track new survey responses; the application uses this queue to calculate the summary statistical data for surveys.

This class also provides support for browsing sequentially through survey responses.

SurveyAnswersSummaryStore Class

This class is responsible for saving summary statistical data for surveys to blobs in the surveyanswerssummaries container, and for retrieving this data.

SurveySqlStore Class

This class is responsible for saving survey response data to Windows Azure SQL Database. For more information, see the section “Implementing the Data Export” later in this chapter.

SurveyTransferStore Class

This class is responsible for placing a message on a queue when a subscriber requests the application to dump survey data to Windows Azure SQL Database.

TenantStore Class

This class is responsible for saving and retrieving subscriber data and saving uploaded logo images. In the sample code, this class generates some default data for the Adatum and Fabrikam subscribers. This class also uses caching to reduce latency when retrieving tenant information from blob storage.

Markus Says:
	We don’t use asynchronous calls to write data to Windows Azure storage, although this is an option that would allow us to write to more than one store concurrently. It would also free up threads for other tasks, which could improve performance. However, as you will see in the section “Synchronous and Asynchronous Calls to Windows Azure Storage” in Chapter 7, “Managing and Monitoring Multi-tenant Applications,” there are no obvious areas where concurrent access to storage is practical in the Surveys application, and our tests revealed no worthwhile gain in performance in our own scenario.

Markus Says:

We don’t use asynchronous calls to write data to Windows Azure storage, although this is an option that would allow us to write to more than one store concurrently. It would also free up threads for other tasks, which could improve performance. However, as you will see in the section “Synchronous and Asynchronous Calls to Windows Azure Storage” in Chapter 7, “Managing and Monitoring Multi-tenant Applications,” there are no obvious areas where concurrent access to storage is practical in the Surveys application, and our tests revealed no worthwhile gain in performance in our own scenario.

Accessing Custom Data Associated with a Survey

Tenants with premium subscriptions can choose to define additional properties for their surveys. When a tenant creates a new survey, the UI allows that tenant to add user-defined fields and specify the values for the new survey. When the tenant views a list of surveys, the list includes the custom data values for each survey.

Defining a Tenant’s Custom Fields

As part of the configuration data for premium tenants, Tailspin stores a dictionary of that tenant’s custom fields. For example, if Adatum had chosen to use two custom fields, ProductName and Owner with its surveys, the blob that contains the Adatum configuration, would contain the following information:

"ExtensionDictionary":{"SurveyRow":[
  {
   "Name":"ProductName",
   "Type":5,
   "Display":"Product Name",
   "Mandatory":false,
   "DefaultValue":null
  },
  {
   "Name":"Owner",
   "Type":5,
   "Display":"Owner",
   "Mandatory":false,
   "DefaultValue":null
  }
]}

Every custom field has a name, a data type such as string or integer, a display value to use as a label in the UI, a Boolean flag to specify whether it is a required field, and an optional default value. A subscriber can add or delete custom field definitions on the Model extensions tab in the private tenant web site.

Writing Custom Fields to the Surveys Table

When a tenant creates a new survey, the private tenant web site UI reads the custom field definitions from the tenant configuration and adds appropriate UI elements to enable the tenant to add values to these fields. The Surveys application then persists these values in the custom fields to the surveys table when the tenant saves a survey.

The following code sample shows the definition of the Survey class that includes a list of user-defined fields. The IUDFModel interface defines the UserDefinedFields property.

[Serializable]
public class Survey : IUDFModel
{
  private readonly string slugName;

  ...

  public string Tenant { get; set; }

  [Required(ErrorMessage = 
    "* You must provide a Title for the survey.")]
  public string Title { get; set; }

  public DateTime CreatedOn { get; set; }

  public List<Question> Questions { get; set; }

  public IList<UDFItem> UserDefinedFields { get; set; }
}

Figure 4 shows an overview of the way that the mechanism for saving a new survey includes the capability to save user-defined fields, and the core classes that are involved in the process. The following section of this chapter describes the process in more detail.

Figure 4 - Overview of the mechanism for saving user-defined fields in a new survey

Figure 4

Overview of the mechanism for saving user-defined fields in a new survey

The SaveSurvey method in the SurveyStore class takes a Survey object as a parameter, and is responsible for creating a SurveyRow object and adding it as a new row to the Survey table. The SurveyStore class takes as parameters to its constructor instances of objects that implement the IAzureTable interface, one each for the Survey table and the Question table.

Objects that implement the IAzureTable interface are generic types that accept a table row type, such as SurveyRow and QuestionRow, and they expose a property named ReadWriteStrategy of type IAzureTableRWStrategy that is populated by a class that exposes the ReadEntity and WriteEntity methods. The following code sample from the ContainerBootstraper class in the Tailspin.Web project shows how the application registers the required types for dependency injection into the SurveyStore class, including an instance of the SurveyRowRWStrategy class for the ReadWriteStrategy property of the AzureTable class.

container.RegisterType<IUDFDictionary, UDFDictionary>();

container.RegisterType<IAzureTableRWStrategy,
      SurveyRowRWStrategy>(typeof(SurveyRow).Name);

var readWriteStrategyProperty = new InjectionProperty(
      "ReadWriteStrategy",
      new ResolvedParameter(
          typeof(IAzureTableRWStrategy), 
          typeof(SurveyRow).Name));

container
  .RegisterType<IAzureTable<SurveyRow>,
    AzureTable<SurveyRow>>(
      new InjectionConstructor(cloudStorageAccountType,
            AzureConstants.Tables.Surveys),
      readWriteStrategyProperty,
      retryPolicyFactoryProperty)
...

The SaveSurvey method of the SurveyStore class saves a SurveyRow instance to table storage by calling the Add method of the AzureTable instance. This method creates a new TableServiceContext to use to access the Windows Azure table by calling the CreateContext method defined in the AzureTable class. The CreateContext method hooks up the ReadEntity and WriteEntity methods of the SurveyRowRWStrategy class to the ReadingEntity and WritingEntity events of the TableServiceContext, as shown in the following code sample.

private TableServiceContext CreateContext()
{
  ...

  if (this.ReadWriteStrategy != null)
  {
    context.ReadingEntity += (sender, args)
      => this.ReadWriteStrategy.ReadEntity(context, args);
    context.WritingEntity += (sender, args)
      => this.ReadWriteStrategy.WriteEntity(context, args);
  }

  return context;
}

The SurveyRowRWStrategy class inherits from the UDFModelRWStrategy class and does not override the WriteEntity method. The following code sample shows the WriteEntity method from the UDFModelRWStrategy class, which creates the user-defined fields and adds them to the table schema.

public virtual void WriteEntity(
      TableServiceContext context,
      ReadingWritingEntityEventArgs args)
{
  var ns = XNamespace.Get(DATASERVICESNS);
  var nsmd = XNamespace.Get(DATASERVICESMETADATANS);
  var survey = args.Entity as SurveyRow;
  if (survey != null && survey.UserDefinedFields != null
      && survey.UserDefinedFields.Count > 0)
  {
    var properties = args.Data
        .Descendants(nsmd + "properties").First();
    foreach (var udfItem in survey.UserDefinedFields)
    {
      var udfField = new XElement(ns + udfItem.Name,
                                    udfItem.Value);
      udfField.Add(new XAttribute(nsmd + "type",
                                    udfItem.GetEdmType()));
      properties.Add(udfField);
    }
  }
}

For more information about this method of saving entities to Azure table storage, see the blog post “Entities in Azure Tables.”

Reading Custom Fields from the Surveys Table

Saving a survey definition to Windows Azure table storage always includes the custom fields that are currently defined in the tenant’s configuration data. When a survey definition is read it must include the custom fields that were defined when the survey was originally saved.

Figure 5 shows an overview of the way that the mechanism for reading a survey definition includes the capability to retrieve the user-defined fields and add them to the SurveyRow that is returned, and the core classes that are involved in the process.

Figure 5 - Overview of the mechanism for retrieving user-defined fields for a survey

Figure 5

Overview of the mechanism for retrieving user-defined fields for a survey

The SurveyStore class executes a Select query against the AzureTable instance that was injected into it when it was instantiated. The ReadingEntity event that occurs when reading entities from the TableServiceContext causes the ReadEntity method in the SurveyRowRWStrategy class to execute.

The SurveyRowRWStrategy class holds a reference to an instance of a class that implements the IUDFDictionary interface. The dependency injection registration you saw earlier causes this to be populated with an instance of the UDFDictionary class.

The ReadEntity method of the SurveyRowRWStrategy class calls the InstanceFieldsFor method of the UDFDictionary class to discover the names of the fields from the table metadata, and then calls the ReadEntity method of the UDFModelRWStrategy class to get the field values from the table itself. The SurveyRowRWStrategy class then assigns the collection of user-defined fields to the UserDefinedFields property of the SurveyRow instance.

Implementing Paging

The code walkthrough in this section is divided into two parts. The first describes how the application maintains an ordered list of blobs. The second describes how the application uses this list to page through the responses.

Maintaining the Ordered List of Survey Responses

Tailspin Surveys saves and processes survey response data using two asynchronous tasks hosted in a worker role. For more information about the implementation that Tailspin chose for saving and processing survey response data, see Chapter 5, “Maximizing Availability, Scalability, and Elasticity.” This section focuses on how the data architecture in Tailspin Surveys supports paging through Survey responses stored in blobs.

The PostRun method in the UpdatingSurveyResultsSummaryCommand class in the worker role calls the AppendSurveyAnswerIdsToAnswerList method for the collection of new survey responses that the task processed in the Run method. The following code example shows how the AppendSurveyAnswerIdsToAnswerList method in the SurveyAnswerStore class retrieves the list of survey responses from a blob, adds the new survey responses to the list, and saves the list back to blob storage.

public void AppendSurveyAnswerIdsToAnswersList(
                   string tenant, string slugName,
                   IEnumerable<string> surveyAnswerIds)
{
  OptimisticConcurrencyContext context;
  string id = string.Format(CultureInfo.InvariantCulture,
                              "{0}-{1}", tenant, slugName);
  var answerIdList = this.surveyAnswerIdsListContainer
             .Get(id, out context) ?? new List<string>(1);
  answerIdList.AddRange(surveyAnswerIds);
  this.surveyAnswerIdsListContainer
             .Save(context, answerIdList);
}

The application stores the list of survey responses in a List object, which it serializes in the JSON format and stores in a blob. There is one blob for every survey and the application stores all of these blobs in the same blob container.

Markus Says:
	The application adds new responses to the queue in the order that they are received. When it retrieves messages from the queue and adds the blob IDs to the list, it preserves the original ordering.

For more information about the concurrency management that the Surveys application uses when it saves the list of survey answers, see the section “Pessimistic and Optimistic Concurrency Handling” in Chapter 5, “Maximizing Availability, Scalability, and Elasticity.”

Implementing the Paging

When the Surveys application displays a survey response, it finds the blob that contains the survey response by using a blob ID. It can use the ordered list of blob IDs to create navigation links to the next and previous survey responses.

The following code example shows the BrowseResponses action method in the SurveysController class in the TailSpin.Web project.

public ActionResult BrowseResponses(string tenant,
    string surveySlug, string answerId)
{
  SurveyAnswer surveyAnswer = null;
  if (string.IsNullOrEmpty(answerId))
  {
    answerId = this.surveyAnswerStore
        .GetFirstSurveyAnswerId(tenant, surveySlug);
  }

  if (!string.IsNullOrEmpty(answerId))
  {
    surveyAnswer = this.surveyAnswerStore
        .GetSurveyAnswer(tenant, surveySlug, answerId);
  }

  var surveyAnswerBrowsingContext = this.surveyAnswerStore
        .GetSurveyAnswerBrowsingContext(tenant,
            surveySlug, answerId);

  var browseResponsesModel = new BrowseResponseModel
      {
        SurveyAnswer = surveyAnswer,
        PreviousAnswerId =
            surveyAnswerBrowsingContext.PreviousId,
        NextAnswerId = surveyAnswerBrowsingContext.NextId
      };

  var model = this.CreateTenantPageViewData
                  (browseResponsesModel);
  model.Title = surveySlug;
  return this.View(model);
}

This action method uses the GetSurveyAnswer method in SurveyAnswerStore class to retrieve the survey response from blob storage and the GetSurveyAnswerBrowsingContext method to retrieve a SurveyBrowsingContext object that contains the blob IDs of the next and previous blobs in the sequence. It then populates a model object with this data to forward on to the view.

Implementing the Data Export

The following code example shows the task in the worker role that executes when it is triggered by a message in a queue. Chapter 4, “Partitioning Multi-Tenant Applications,” describes the message queues and the worker role in Tailspin Surveys in more detail. This section focuses on how the application handles data and the data export process.

Markus Says:
	This task is part of the worker role described in Chapter 4, “Partitioning Multi-Tenant Applications.” A message in a queue triggers this task.

You can find the Run method that performs the data export in the TransferSurveysToSqlAzureCommand class in the worker role project. The SurveyTransferMessage class identifies the tenant who owns the data, and the survey to export.

public bool Run(SurveyTransferMessage message)
{
  Tenant tenant = 
        this.tenantStore.GetTenant(message.Tenant);
  this.surveySqlStore.Reset(
        tenant.SqlAzureConnectionString, message.Tenant, 
        message.SlugName);

  Survey surveyWithQuestions = this.surveyStore
        .GetSurveyByTenantAndSlugName(message.Tenant, 
             message.SlugName, true);
            
  IEnumerable<string> answerIds = this.surveyAnswerStore
        .GetSurveyAnswerIds(message.Tenant, 
             surveyWithQuestions.SlugName);

  SurveyData surveyData = surveyWithQuestions.ToDataModel();

  foreach (var answerId in answerIds)
  {
    SurveyAnswer surveyAnswer = this.surveyAnswerStore
        .GetSurveyAnswer(surveyWithQuestions.Tenant, 
             surveyWithQuestions.SlugName, answerId);

    var responseData = new ResponseData 
      {
        Id = Guid.NewGuid().ToString(),
        CreatedOn = surveyAnswer.CreatedOn
      };

    foreach (var answer in surveyAnswer.QuestionAnswers)
    {
      QuestionAnswer answerCopy = answer;
      var questionResponseData = new QuestionResponseData
        {
          QuestionId = (
            from question in surveyData.QuestionDatas
            where question.QuestionText == 
                  answerCopy.QuestionText
            select question.Id).FirstOrDefault(),
          Answer = answer.Answer
        };
      responseData.QuestionResponseDatas
                .Add(questionResponseData);
    }
    if (responseData.QuestionResponseDatas.Count > 0)
    {
      surveyData.ResponseDatas.Add(responseData);
    }
  }

  this.surveySqlStore
      .SaveSurvey(tenant.SqlAzureConnectionString,
                  surveyData);
  return true;
}

The method first resets the survey data in Windows Azure SQL Database before it iterates over all the responses to the survey and saves the data to the tenant’s SQL Database instance. The application does not attempt to parallelize this operation; for subscribers who have large volumes of data, the dump operation may run for some time.

The application uses LINQ to SQL to manage the interaction with Windows Azure SQL Database. The following code from the SurveySqlStore class shows how the application uses the SurveyData and SurveySqlDataContext classes. The SurveySql.dbml designer creates these classes.

public void SaveSurvey(string connectionString, 
  SurveyData surveyData)
{
  this.ConnectionRetryPolicy.ExecuteAction(() =>
  {
    using (var dataContext = 
      new SurveySqlDataContext(connectionString))
    {
      dataContext.SurveyDatas.InsertOnSubmit(surveyData);
      try
      {
        this.CommandRetryPolicy.ExecuteAction(
          () => dataContext.SubmitChanges());
      }
      catch (SqlException ex)
      {
        TraceHelper.TraceError(ex.TraceInformation());
        throw;
      }
    }
  });
}

This code uses the Transient Fault Handling Application Block to handle any transient errors when it tries to save the data to SQL Database.

Displaying Questions

The application stores the definition of a survey and its questions in table storage. To render the questions in a page in the browser from the Tailspin.Web.Survey.Public web application project, the application uses the MVC EditorExtensions class.

Markus Says:
	Tailspin chose this mechanism to render the questions because it makes it easier to extend the application to support additional question types.

When the Display action method in the SurveysController class in the TailSpin.Web.Survey.Public project builds the view to display the survey, it retrieves the survey definition from table storage, populates a model, and passes the model to the view. The following code example shows this action method.

[HttpGet]
public ActionResult Display(string tenant, 
                            string surveySlug)
{
  var surveyAnswer = CallGetSurveyAndCreateSurveyAnswer(
        this.surveyStore, tenant, surveySlug);

  var model = 
      new TenantPageViewData<SurveyAnswer>(surveyAnswer);
  if (surveyAnswer != null)
  {
    model.Title = surveyAnswer.Title;
  }
  return this.View(model);
}

The view uses the EditorExtensions class to render the questions. The following code example shows how the Display.aspx page uses the Html.EditorFor element that is defined in the System.Web.Mvc.EditorExtensions class.

<% for (int i = 0; 
   i < this.Model.ContentModel.QuestionAnswers.Count; i++) 
{ %>
  ...
<%: Html.EditorFor(m=>m.ContentModel.QuestionAnswers[i], 
      QuestionTemplateFactory.Create(
        Model.ContentModel.QuestionAnswers[i].QuestionType)) %>
  ...
<% } %>

This element iterates over all the questions that the controller retrieved from storage and uses the QuestionTemplateFactory utility class to determine which user control (an .ascx file) to use to render the question. The user controls FiveStar.ascx, MultipleChoice.ascx, and SimpleText.ascx are in the EditorTemplates folder in the project.

Displaying the Summary Statistics

The asynchronous task that generates the summary statistics from survey responses (this task is described in Chapter 5, “Maximizing Availability, Scalability, and Elasticity”) stores the summaries in blob storage. It uses a separate blob for each survey. The Surveys application displays these summary statistics in the same way that it displays questions. The following code example shows the Analyze action method in the SurveysController class in the TailSpin.Web project that reads the results from blob storage and populates a model.

public ActionResult Analyze(string tenant, 
                            string surveySlug)
{
  var surveyAnswersSummary = 
      this.surveyAnswersSummaryStore
        .GetSurveyAnswersSummary(tenant, surveySlug);

  var model = 
      this.CreateTenantPageViewData(surveyAnswersSummary);
  model.Title = surveySlug;
  return this.View(model);
}

The view uses the Html.DisplayFor element to render the questions. The following code example shows a part of the Analyze.aspx file.

<% for (int i = 0;
        i < this.Model.ContentModel
                .QuestionAnswersSummaries.Count; i++)
{ %>
<li>
<%: Html.DisplayFor(m => 
      m.ContentModel.QuestionAnswersSummaries[i],
      "Summary-" + TailSpin.Web.Survey.Public.Utility
        .QuestionTemplateFactory.Create
          (Model.ContentModel.QuestionAnswersSummaries[i]
            .QuestionType))%>
</li>
<% } %>

The user control templates for rendering the summary statistics are named Summary-FiveStar.ascx (which displays an average for numeric range questions), Summary-MultipleChoice.ascx (which displays a histogram), and Summary-SimpleText.ascx (which displays a word cloud). You can find these templates in the DisplayTemplates folder in the TailSpin.Web project. To support additional question types you must add additional user control templates to this folder.

More Information

For more information about Windows Azure storage services, see “Data Services” and the Windows Azure Storage Team Blog.

For a comparison of Windows Azure Table Storage and Windows Azure SQL Database, see “Windows Azure Table Storage and Windows Azure SQL Database - Compared and Contrasted.”

For further information about continuation tokens and Windows Azure table storage, see the section, “Implementing Paging with Windows Azure Table Storage” in Chapter 7, "Moving to Windows Azure Table Storage" of the guide “Moving Applications to the Cloud.”

Next Topic | Previous Topic | Home | Community

3 – Choosing a Multi-Tenant Data Architecture

Storing Data in Windows Azure Applications

Windows Azure Table Storage

Windows Azure Blob Storage

Windows Azure SQL Database

Other Storage Options

Storage Availability

Multi-Tenant Data Architectures

Partitioning to Isolate Tenant Data

Shared Access Signatures

Data Architecture Extensibility

Data Architecture Scalability

An Example

Option 1 — Using a Single Table

Option 2 — Table per Tenant

Option 3 — Table per Base Entity Type

Option 4 — Table per Entity Type

Option 5 — Table per Entity Type per Tenant

Comparing the Options

Goals and Requirements

Isolation of Tenants' Data

Application Scalability

Extensibility

Paging through Survey Results

Exporting Survey Data to SQL Database for Analysis

Overview of the Solution

Storage Accounts

The Surveys Data Model

Storing Survey Definitions

Storing Tenant Data

Storing Survey Answers

Storing Survey Answer Summaries

Comparing Paging Solutions

Paging with Table Storage

Paging with Blob Storage

Comparing the Solutions

The SQL Database Design

Inside the Implementation

The Data Store Classes

SurveyStore Class

SurveyAnswerStore Class

SurveyAnswersSummaryStore Class

SurveySqlStore Class

SurveyTransferStore Class

TenantStore Class

Accessing Custom Data Associated with a Survey

Defining a Tenant’s Custom Fields

Writing Custom Fields to the Surveys Table

Reading Custom Fields from the Surveys Table

Implementing Paging

Maintaining the Ordered List of Survey Responses

Implementing the Paging

Implementing the Data Export

Displaying Questions

Displaying the Summary Statistics

More Information

Additional resources