Windows Azure Cost Assessment
This paper provides a way to frame and then understand the costs of a Windows Azure application. The subject of "pricing a Windows Azure application" stops many in their tracks. Typically, a developer does not spend much time thinking about the things that Azure puts a price on—compute, storage, bandwidth. And the matrix of options and schedules makes it appear even more daunting. But these fears can be allayed with some reasoning and a methodological approach. This article contains three main messages.
First is an argument to focus on Compute and Windows Azure SQL Database costs. Although there are various costs associated with various Windows Azure platform services, the most expensive are for Windows Azure Compute and SQL Database. Therefore, you can make an initial best-estimate by calculating Compute and SQL Database usage. In most application designs, all other costs (for example, bandwidth, transactions) are a smaller proportion than compute and SQL Database. (The exceptions to this general rule are called out in this article.)
Second, to begin estimating costs, several distributed application models that work well on Windows Azure are presented. Begin a pricing exercise by mapping a new or existing application to one of the models. When you have done that, assessing the cost becomes feasible.
Finally, several fundamental strategies are presented, and links to topics of greater depth or specificity on these topics are included.
|Because the architecture of cloud applications differs greatly from that of on-premises applications, a one-to-one mapping is not always possible. In this light, this paper can only describe a methodology for organizing and preparing for a new development project. The actual development is naturally far more variable for every project.|
Before beginning a cost assessment, ensure that Windows Azure is the right platform. For guidance on that decision, see Evaluate Architecture and Development.
Authors: Jaime Alva Bravo, Sidney Higa, and Ralph Squillace
Acknowledgements and Reviewers: Tony Meleg, Paolo Salvatori.
Assessing the cost of an application should be balanced with the significant cost savings of the cloud. For example, Microsoft IT has documented a cost reduction from $1500 to $45 for one month. (See Microsoft's IT Journey to the Cloud.) A large part of the savings is due to the reduction of infrastructure costs. In short, you save money on: computers (hardware), system software, electricity, the cost of housing the physical machines, and the cost of maintaining and patching physical or virtual machines. Add to the fiscal savings the reduction in time expenditures.
That stated, if you are a developer or software architect asked to estimate the cost of a new Azure application, several things immediately strike you:
"I've never had to think of the things that Azure charges for." For example, CPU sizes and memory requirements, data storage costs, and bandwidth.
"The number of options are overwhelming." To name a few, there are different packages for CPUs, a pay-as-you go plan or a six month plan, Service Bus costs, as well as SQL Database costs.
"How can I price what I haven't created yet?" The Azure architecture requires you to learn a new way of doing things, which takes some time to assimilate.
There is also the cost of developing a new application. However, development expenses are not the subject of this paper. The present focus is on predicting the cost of running the Azure application.
The Major Costs: Compute and SQL Database
In the past year, through many meetings with customers, it has been found that the single most expensive cost is for Azure Compute. The size and number of Compute instances is the greatest determinant of the bottom line. For most applications, at least one instance runs 24 hours a day. To ensure durability requires at least two instances. And if you incorporate scaling, the number of instances can spontaneously multiply. Also, other Azure services (such as Service Bus) generally run only in response to a running Compute instance. In short, since Compute instances are one constant factor, they are also the item that affects costs most.
|Always delete a hosted service from the Management Portal if you do not want to be charged for it. Even if a hosted service is in a suspended state (deployed but not running), you are charged for it. Many customers have been surprised to find a charge for a nonrunning service. You can always redeploy a service from Visual Studio.|
The second major cost is for SQL Databases, for a few reasons. First, an SQL Database can be running independently of a Compute instance. Second, SQL Database charges for the amount of storage (capacity) used, and the capacity charge is independent of time or usage.
Other costs, such as Windows Azure Storage, typically do not comprise a large percentage of the running cost. (For example, 100 GB of data in Windows Azure storage costs $21.50 a month.) If you do not require the relational database capabilities of SQL Database, then you can reduce the cost by using Azure storage services. The exception to this common case occurs when streaming gigabytes of data out of a Windows Azure data center, in which case bandwidth becomes a cost factor.
|There is a charge for data transfers out of a Windows Azure datacenter (also called "bandwidth" or "data egress"). Measuring data egress is potentially the most unpredictable cost because it depends on the application and the end user's request. If the application allows it, a long report might result from a particular request.|
Windows Azure Compute Notes
Windows Azure Compute cost is based on size and instances. The formula is simple:
Size *total instances
The critical factor is size. Choices span from "extra small" to "extra large." (For a list of size details, see Windows Azure Compute, and How To Configure Virtual Machine Sizes.) To complicate this, running many instances of an extra small machine can be more effective than running one instance of an extra large machine. The final choice depends on the application. If the application does intensive processing, then you require an extra large (or large, or medium) instance. In contrast, you can have an application that does little processing but serves many users. In that case, a small or extra small instance that scales out to many instances in response to demand may be all that is needed.
Compute Size and Scalability
The most important benefit of the Windows Azure cloud platform is its ability to scale up or down on an as-needed basis. Once the application is deployed, the number of running instances can be automatically increased or decreased. However the size of the instance cannot be dynamically altered after deployment. This immutability means testing and measuring performance before settling on a size. The main consideration to setting the optimal size is whether the instance can handle the computing load the application demands of it. Another arbiter is the performance expectations of users. If performance matters to users, ensure that the size and instance count are able to satisfy the requirements. You can quickly increase computing power by deploying many more instances. But the cost of the total deployment is based on the unchangeable size of the instances. So select the smallest possible compute size, and increase the instances as needed.
You can specify the size of the Compute instance by editing an XML configuration from the Azure Management portal. This freedom allows experimentation with different sizes and instance counts before final deployment. Change the instance size by setting the vmsize attribute of the WebRole configuration element. By minimizing the compute size, you leverage the scalability benefit of Windows Azure.
Maximizing the Compute Instance
When developing and testing the final application, consider if you are utilizing the most of each Compute instance. For example, each instance is a virtual machine, with a file system that you can save data to. Because no instance is guaranteed to be persistent or durable, the data saved on each instance is subject to disappearance if the instance is recycled. However, if your application can survive that uncertainty, then you may save some cost by using the file system for temporary data storage. Another example is to compare how two different instance sizes (for example "small" and "medium") perform under load. It may be that one medium size instance proves more economical under load than two small instances.
SQL Database Considerations
SQL Database is a relational database that is based on SQL Server. Someone familiar with SQL Server can reuse their skills on SQL Database. SQL Database is priced according to the maximum capacity, measured in gigabytes, you specify. Listed below are items to consider.
Rearchitect or Use As-Is
The choice of SQL Database in an application may depend on how much it costs to rearchitect existing data. The choice is between:
Refactoring data and using a combination of Azure storage and SQL Database.
Migrating the data as-is to SQL Database.
In most new applications, developing the application to use Azure Storage is cost-effective. But if you have an existing SQL Server database that must be used, the cost of rearchitecting the data may be prohibitive. In that case, simply migrating the data to SQL Database may cost less. Using SQL Server databases with no change may also be a stage in migration. In other words, if developing the application as soon as possible is a priority, using SLQ Server database is expedient.
Minimizing SQL Database Server Instances
The cost of SQL Database can be minimized by using as few SQL Database Server instances as possible. For example, you can develop an application that uses 10 SQL Database Server instances, each with 10 GB of capacity, that each costs $46 a month. Or you can use one SQL Server instance with 100 GB of capacity at $176 per month. The total cost for the first example would be $460; the cost for the second remains at $176. In the second case, you must change the structure of the database to maintain the semantics of having 10 separate databases.
The values of some Azure features become more apparent as you develop an application. One of these features is Azure Caching. As on an on-premises system, caching increases performance by storing recently used data for immediate reuse. This increase helps to satisfy end users of the application. It may also help the cost of the deployment by decreasing the need for more SQL Database capacity.
SQL Database Processing Performance
Behind the scenes in the Azure datacenter, SQL Database runs in a multi-tenant environment. This means that several instances of SQL Database are running on the same hardware. And the sharing of resources affects performance. As resources are stretched, all tenants are affected. Because of this circumstance, SQL Database employs an algorithm that limits the number of connections per role: each Compute instance is allotted four connections. This limitation mitigates the over-utilization of all resources by one tenant.
The multi-tenancy effect on resources comes into play when SQL Database processes a complex query. A query that pulls data from several tables and using several joins taxes the SQL engine to some extent. This may be a consideration for an application that requires intensive SQL Database resources. When designing a new application, keeping the data resources on-premises may be a viable option.
To begin a development, select an appropriate architecture. The following list shows a few models to use. These models progress from least complex to most complex.
Standard Scalable Web Application with Storage
This model shows a typical web application that consists of scalable web roles. Each role accesses data from a central storage repository, in this case SQL Database.
An example scenario: a "store front" application that features an online catalog of goods. Users query or browse the selection and add items to a shopping cart. At checkout, the application processes the order. Data for customer accounts, item information, and cart state is kept in a combination of Azure tables, blobs (for images) and SQL Database (for customer data).
This model scales the web roles up and down as needed. The storage here is SQL Server. Depending on the application, the storage can be any combination of Azure tables, Azure blobs, and SQL Storage as well.
Hybrid On-Premises Application Using Azure Storage and SQL Database
This model only uses Azure storage and SQL Database. In this case, the application cannot scale out since it is not hosted on a web or worker role. However, the data can be accessed from other applications (either on-premises, or in the cloud). In this scenario, the on-premises application is used to manage the data. This model is also a hybrid, blending an on-premises application and a cloud database. As such, it is possibly a "first step" in migrating an existing application to Azure.
Background Processing Application
This model shows an application that does intensive processing by scaling up the number of worker roles. There is only one web role instance that allows users to log in and start an intensive processing job.
An example scenario: a service that encodes video from a high-resolution format to many lower bit-rate formats. The customer uploads a large video file to blob storage. The application distributes the encoding job to many worker roles. The processing is coordinated by using either Service Bus Queues or Azure Queues. The new files are deposited into blob storage. The customer can then pick up the blobs, or stream the data using the blob information.
For a primer on queues, see How to Use Service Bus Queues
Scalable Web Application With Scalable Background Processing
This model is similar to the previous "Background Processing Application." The difference the scalability of the web role instances which enables large numbers of users to access the service. The model adds a set of Azure tables that track state (of orders, or other information) for each user.
An example scenario is a travel service that queries many other sites and returns ticket options.
Scalable Hybrid Web Application
A hybrid is an application that uses services that are still sited on-premises. There are many reasons for such architecture. A common reason is to keep the data on-premises because of security concerns, or regulatory reasons. Another is the inability to move an existing service to the cloud; a service is not owned, but is still needed for the application. Another reason is simply time. A complex, existing application migrates to Azure in stages. As it migrates, some of its services remain on-premises. This model allows for scalable web roles, serving many users. SQL Database (or Azure Storage) is used to store data. The Windows Azure Service Bus relay service enables the application to connect to on-premises database (or other services).
|When using an on-premises database, the cost of bandwidth may become a significant cost.|
A second estimate would presume that the data storage requirements are the same since the catalog does not change. However, the number of instances and the bandwidth increases to handle more load.
One caveat: Windows Azure and the benefits of cloud architecture are not automatically suitable for every application. For instance, deploying ten web sites on ten web roles using Azure may not be cheaper than using a service provider that charges a minimal amount for hosting each site.
The models can be easily used to begin a pricing exercise.
The simple scenario: a "storefront" application where users peruse a catalog and order items. The catalog consists of images and text. The images are stored in blobs, and the text is stored as tables. Customer data is stored in a SQL Database, along with order data. The web application runs on a web role that can be increased with more instances. For a base configuration (minimal use), the application uses the following:
Two small web roles.
50 GB of catalog data in blobs and tables (blobs and tables are charged at the same rate).
10 GB on SQL Database for customer data.
10 GB of bandwidth.
These numbers are used with the Azure calculator to produce the following snapshot. The total for a minimal configuration is about $250.
In this model, the data size remains the same since the catalog does not increase in size. The number of instances does scale out, and bandwidth increases as well.
A simplistically derived average of the two costs amounts to $394 for a month. Of course, the average can be calculated using more configurations and with different durations.
The Development/Pricing Cycle
The development and final cost of an application is an iterative process. As the application matures, one discovers more possibilities for features, which add costs. As understanding of the technology grows, new ways for optimizing performance also become apparent (which reduce costs). In sum, there is no way to absolutely model to predict the cost of the final product. The final cost is a result of the final design, which is not known at the start. However, there is a way to begin costing—here is a starting framework:
Understand how Azure works. You should have a basic understanding of the web/worker role models, and the storage options, as well as bandwidth.
Select a model (as outlined later in this document).
Determine whether the model gives enough performance to satisfy end users.
For example, a quick response may be essential to users, who will otherwise abandon the application. But if users expect long processing, your options increase.
Do a capacity planning estimate to determine what size web or worker roles you require.
Price your application with the model. Assign numbers to the model with size and instances, storage and other service costs. You can use the online Azure calculator for this step.
Calculate the costs for both minimum and maximum usage periods.
The period depends on your application. For example, a reporting application is used by all managers solely at the end of a day. Or a retail application is used only during the daylight hours, during sales, or during holiday shopping periods, and much less at night.
Estimate the cost over a month; use the minimum and maximum figures.
The list is a simple start to the process. As the application develops, performance and capacity requirements become clearer. As you deploy and test the application, more areas of optimization become apparent as well.
Before you launch a development project, here are a few good practices and strategies to apply.
Capacity planning has two goals. The first is to determine what resources are necessary for an application to meet its service level agreement (SLA). The second is to estimate the actual cost of those resources. There are three parts for the exercise: (1) setting performance goals (the SLA), (2) determining final resource needs by testing and reconfiguring, and (3) calculating the cost of those resources. The planning can be accomplished by following the process shown in the graphic.
Determine Cost Basics, or "Where are my costs?"
In this preliminary stage, one establishes which parts of Windows Azure will accrue the most costs. For example, you settle on a model that includes scalable web roles and SQL Database, but no worker roles, or other services.
Factor the workload.
As you refine the model, you determine where the work occurs. For example, specify an architecture that requires scaling out worker roles to do an intensive computing job for 80% of the time.
Set Service Level Requirements.
Formulate the "right" levels of service availability and performance over time. Basic questions include: will the service run 24 hours a day, every day? Also, what are the minimum and maximum usage periods and durations. Keep in mind that performance affects the satisfaction level of end users and their acceptance of the application.
Configure Compute and Services.
One starts with a preliminary stake in the ground, a best guess at the compute size and number of instances needed. With those numbers, one can start to use the Azure Calculator to estimate final costs.
Test Low/High Scenarios.
Test the working system in maximum and minimum usage scenarios with the given Compute size and instances, in all (high and low) configurations. Costs for those configurations can be calculated more accurately with the working system. If the service levels are met, the system can move to production. If not, reconfigure the Compute parameters and retest. Also note that diagnostics and performance counters have an adverse effect on the tested system's performance. When the system is near production quality, removal of the tracing and counters increases performance.
If you are developing a multi-tenant application, and need to monitor each tenant's usage, see the metering block sample.
Costing an Existing Line-Of-Business Application
What if you are migrating an existing application to Azure?
In an enterprise that is running several Line-of-Business applications, the Information Technology (IT) department handles the actual deployment and maintenance of the computers. The IT department also monitors the service and its performance. To move an existing application to Windows Azure, begin with your IT department and find out how much the application costs in specific terms. The questions to ask are:
How machines (virtual or real) does the service run on?
What is the configuration of each machine? How many cores? What speed does it run at? How much memory does it have?
What is the average use? Is the application used constantly? Or sporadically? When are the peak hours?
What is the average of all peaks? Use the average to configure a reasonable Compute instance. The size and number of instances should accommodate the peaks and ensure service availability.
What is the absolute peak usage? Allocate enough computing power to handle the absolute peak if it is significantly beyond the average peak.
Even if you have not migrated an existing application yet, the answers provide a starting point for your Compute resources.
For a set of tools and guidance on migration, see the Microsoft Assessment and Planning Toolkit.
Cloud computing is a new endeavor for most developers, support engineers, testers, and marketing professionals. One obstacle that comes with a new approach is a tendency to do business "as usual"—using familiar methods and processes. While employing standard methods usually has no detriment, you may rethink some practices. One area is computer resource planning for organizations in an enterprise.
For one Microsoft customer, using customary habits added extra cost.
The product support, development, test, and marketing teams from the customer used separate sets of computers as they wished. The development and test teams were accustomed to rebuilding the computers on a regular basis to start with a clean slate. The marketing and support team, on the other hand, were accustomed to keeping the same software to do repeatable demos (marketing) and issue reproduction (support). At the onset, all four teams created different Azure subscriptions to accommodate their practices. The result was four separate subscriptions. After some time, the teams realized only two subscriptions were needed. One subscription could be used for support and marketing and a second for development and test.
Refactor Data Using Azure Storage and SQL Database
Data is a major component of every application. For Azure, you have two technologies to consider, SQL Database, and Azure Storage. Azure Storage is affordable at several dollars a month for gigabytes of data. However, the data is stored as either blobs, in tables, or in queues. Both the best (and worst) thing about Azure tables is that they are not relational. The benefit is that it can scale out automatically without any work on your part. If you are used to programming in SQL, this also forces you to rethink how you are using data, and how you storing it. Experience has shown that many automatically use SQL Server for even trivial (non-relational) purposes.
However, SQL Database is a relational database, and has features that many applications require – especially enterprise applications. While it is more expensive, if you need the capabilities of a relational database, then budget for it. The main reason for using SQL Database is that you already have an existing database that contains your strategic data. Another prime reason is that you create relational queries on the fly.
Given that Azure storage is inexpensive, many customers have found it feasible to use it for increasingly large amounts of their data needs. The remaining data that is filled using SQL Database for metadata. While not a hard and fast rule, it can be used as a thought to keep in mind. To sum up: Use Windows Storage to store for most of your data, and use SQL Database to store metadata. Query your SQL Database and retrieve the data from the Windows Storage to minimize data costs.
Diagnostics, Troubleshooting and Performance Considerations
Troubleshooting and performance testing require the enabling of performance counters and tracing. The few points mentioned here may help save time or effort. For basic information about diagnostics, see Enabling Diagnostics in Windows Azure.
Verbose logging of diagnostic traces during development is necessary for debugging. However, most of that logging should be switched off before deploying the application into production mode. But after deployment, some strategic logging data is needed in the event of an error. Informative data is critical because debugging a running Azure service is only possible by perusing logs generated during a failure. The application itself determines what are the key data points, and the possible error conditions that may occur. So the deployed application must include some tracing. Determining what, where, and when to log the critical traces is a task carried out during development.
Because logging affects performance, you should switch off the majority of diagnostic logging before final deployment. But recompiling and redeploying an application is time consuming. Therefore, it is preferable to switch the logging mode off using application configuration.
To autoscale a service correctly, you must embed key performance counters into the application. The actual counters depend on your application. For example, if your application autoscales based on web page metrics, use one of the IIS counters. Or if the application autoscales based on a process, use an application performance counter.
Diagnostics and performance data can accumulate quickly, especially when you monitor many counters at high frequencies. That data is typically saved on an Azure Storage service. Therefore, be sure to monitor the storage and have a plan to manage the data, as it can fill your capacity quota. For example, the plan may be to save the data at set intervals to an on-premises location for later analysis.
. For more information about monitoring an application, see Monitoring a Windows Azure Application.
Autoscaling is one of the primary benefits of developing a Windows Azure cloud application. To make the most of Azure, you can automatically increase and decrease the number of running Compute instances. To most efficiently implement autoscaling, use the The Autoscaling Application Block.
The Autoscaling block uses a rules engine to specify when and how to scale. The engine also allows for rules to be overridden when certain criteria are in effect, such as time of day. For more information, see How to use the Autoscaling Application Block.
Planning for Transient Faults, or Retry Logic
Transient network failures are inevitable. Such failures happen on any network, but the occurrence is multiplied on the Internet. Often, if you try again later, the connection works. Instead of using an ad hoc solution, implement The Transient Fault Handling Application Block from the start. Network failures will occur after you deploy the service to Windows Azure. It is much better to implement the framework from the start rather than having to plumb it into your solution at a later date.
Each attempt to connect to a service that charges for connections like ACS or Service Bus costs a small amount of money, even if the attempt fails. These costs are small, and usually just a fraction of processing or data costs. If you are connecting and reconnecting enough to cost any large amount of money, it's likely that your application is not meeting your performance requirements for its clients. Cost, in that case, is not the real problem.
If you are new to Windows Azure, take time to learn how about the Service Bus relays, queues, and topics and subscriptions work. The Service Bus was originally conceived of as a relay service. That is, an intermediary between endpoints that is east to negotiate and handle. The Service Bus easily negotiates security and traverses firewalls. But there are other technologies included with the Service Bus which facilitate or increase the performance of Azure applications. In particular, you can use the Publish/Subscribe capabilities.
The Service Bus is not often the largest cost of an application—but your application could be the exception. These behaviors may be necessary to accomplish your goals: a high level of chatter, a large number of queues, or tremendous and constant data transfer outbound from the data center. As a rule, testing a design that is metered properly gives you an idea whether the Service Bus accounts for a larger part of costs.
As the application nears its maturity, it is necessary to perform load testing on the application, especially if you expect the application to have many users. Load testing is also concurrent with experimenting with web/worker role configurations. For example, you may try to increase the number of instances of smaller Compute sizes. Or you may do the converse: increase the size of Compute instances. Larger sizes imply that the instances will not scale out at the same rate.
For more information about using Visual Studio to conduct load tests on Azure applications, see Using Visual Studio Load Tests in Windows Azure Roles.
The value of diagnostics and monitoring is the data supplies base performance measurements. As the application is deployed and measured, performance can then be improved, and improved performance entails lower costs. The subject of performance is a separate topic. However, understating the Windows Azure Content Delivery Network (CDN) or the Caching service and using them in your architecture can also optimize performance, or improve customer satisfaction.
This topic is meant to supply a starting methodology for costing an Azure cloud service. It is meant for novices who are beginning to investigate a move to the Windows Azure platform. The guidance is not exhaustive. But it is meant to avoid errors, and to raise awareness of items that may be new to someone just starting out. As the Azure platform matures, and as new Azure services arise, this document will be kept updated and relevant.