Real World: Considerations When Choosing a Web Role Instance Size: How small should I go?
Last updated: February 2012
The Windows Azure Platform offers developers the ability to build web applications that can easily scale both horizontally (scale out) by adding additional virtual machines, and vertically (scale up) by adding additional resources to a single virtual machine. A developer can deploy a web application as a web role in Windows Azure, and specify the number of web role instances to run. Each instance is a separate virtual machine running IIS, and configured to receive requests through a provisioned load balancer. To scale horizontally, the number of web roles can be increased (or decreased) either manually through a management portal or automatically via REST APIs or auto-scaling tools.
There is also a vertical scale component that is available in the Windows Azure Platform. There is the ability to specify the size of the role, or virtual machine. Below is a table (taken from http://msdn.microsoft.com/en-us/library/windowsazure/ee814754.aspx) which describes the different sizes that are currently available (as of December 2011):
Figure 1: Compute Instance Sizes
The cost per instance increases with the size of the instance. Current pricing details can be found at http://www.windowsazure.com/pricing.
When deploying your web application to Windows Azure, you must select a single role size for all of your web role instances. Currently you cannot vary your instance sizes within a deployment, meaning that if you run four web role instances, they must all be the same size. You do have the ability to change sizes, but this applies to all instances in your deployment.
What Size Should I Start With?
One question that frequently comes up when building applications on the Windows Azure Platform is “what role size should I select?” Of course, the answer, as with most answers to questions like this, is…“it depends on your application.” And that is true, of course. Some application owners will have a good sense for the number of cores, amount of memory, or network capacity their application will require. But others take a more “wait and see” approach. For those folks, let’s dive a little deeper into this question and try to come up with some guidance and principles to make an initial decision.
The default role instance size that the Windows Azure Platform will start you on is the small instance size. Some customers feel uncomfortable starting there. There seems to be the notion that I should “start bigger” to avoid issues and go from there. Some might just make an arbitrary choice, such as medium, and revisit the decision later. But are there principles that can help guide this decision for developers unsure about which instance size to use?
In this article we will provide guidance on sizing decisions for web roles by:
Comparing performance of a web application under load for different instance sizes
Factoring in a hypothetical usage pattern to understand scaling implications for instance sizes
Evaluating cost when scaling for different instance sizes
Comparing Small vs. Medium Instance Sizes in a Load Test
In this article we will take a close look at the small instance size in comparison to the medium instance size for web roles for a single web application scenario. We will argue that you should start with more small instances as opposed to fewer medium instances. We will show how four small instances perform equally compared to two medium instances under similar conditions. Based on the pricing noted above, there is no difference in cost between four small instances versus two medium instances. However, we believe there are numerous advantages to running under the four small instance deployment that make it a favorable option. Using a hypothetical scenario of a web application with a typical load ranging up to 400 concurrent users, we will show that scaling out with the small instance size has cost benefits when compared to the medium instance size.
In order to test our hypothesis, we need a way to measure performance of a small Windows Azure role instance against a medium Windows Azure role instance. For our tests, we developed a sample web application, ran load tests against it running on the different instances sizes under different scenarios, and evaluated the data.
For the sample web application and load test tool, we leaned upon an existing article from the Windows Azure’s Real World Guidance collection entitled Simulating Load on a Windows Azure Application. This article is a very good read and sets the stage well for executing load on a Windows Azure application. The article outlines a simple calculation to perform in a sample web application. It also describes how to use a worker role to execute requests against that sample application to exercise load.
As suggested in this article, we instrumented the sample application to collect performance counters via Windows Azure Diagnostics. Here is a snippet from our application which collects several different counters and stores the data periodically to a Windows Azure Storage account.
Figure 2: Windows Azure Diagnostics Setup
We analyzed the counters collected using Cerebrata’s Azure Diagnostics Manager. We also added a web interface for starting and stopping a load test. Starting a test involves entering a URL for the worker role instances to execute requests against and the number of concurrent users you wish to simulate during the test.
Figure 3: Starting a Load Test
After a test is in progress, worker role instances continue to execute requests until it is stopped. The nice part about this interface is that it can be reused to simulate load against any URL, not just our test one, and be reused for future load test scenarios.
Additionally we made a few enhancements to the worker role provided from the Simulating Load on a Windows Azure Application article that exercises load on a URL to support a more distributed scenario. Each worker role instance will spin up a number of threads in parallel to simulate concurrent users executing requests on our sample application. The total number of threads used within each worker role instance was determined by dividing the total number of concurrent users for the test by the number of worker role instances running. Each thread will pause for a random amount of time, one to five seconds, between requests to better simulate an actual user and their typical behavior when navigating on a site. For example, a test running 200 concurrent users could be made up of four worker role instances each running 50 threads in parallel, each executing at least one request every one to five seconds against the web application. The worker role instance reports response time for each request for the load test to a SQL Azure database for later analysis.
It is important to note the difference between what we define as a concurrent user and a single request. Over a ten minute period, a test running 200 concurrent users each making a request at least every five seconds could yield over 50,000 requests on the system.
Looking at the Results
Given the hardware specifications of the different role instances sizes from above, we decided to compare the performance of two medium role instances against four small role instances. We also thought it would be interesting to see how three small role instances performed as well for comparison. We executed load tests simulating load up to 500 concurrent users for a minimum of ten minutes.
As expected, when we look at the average number of requests per second, we can see an upward trend that corresponds to increasing number of concurrent users for our scenarios. It’s no surprise that our two medium instances have to handle more requests than three or four small instances.
Figure 4: Average Number of Requests per Second
Let’s see if we can find differences in performance between the instance sizes as the load increases.
First, let’s see how the response time, as captured by our load test agents (worker role instances), was affected. The response time of each request during a load was captured and aggregated to derive an average response time for each scenario.
Figure 5: Average Response Time
As you can see, average response time (in milliseconds) increased as load increased. The increase was more dramatic in our scenario running three small instances. In comparing two medium instances vs. four small instances, the results were similar, with two mediums slightly outperforming four smalls.
Next, let’s examine our performance counter measuring CPU of our role instances. Here is a chart which shows average CPU across role instances for these different scenarios.
Figure 6: CPU
In terms of average CPU across role instances, you can see that two medium instances and four small instances performed very similarly, while three small instances ran higher, as expected. Some people would recommend a threshold of 75% CPU as an indication of the need to scale. In applying that threshold here, you can see that in our load test for 500 concurrent users we approached that threshold for our scenarios running two medium and four small instances, and we exceeded it for our three small instance scenario.
We were also able to correlate a rise in the number of requests queued to the number of concurrent users and level of CPU during a test. In this grid, you can see that the max value of a counter for requests queued recording during a load test was much higher during our 500 concurrent users test than the 400 concurrent user test.
Figure 7: Requests Queued
From this data, it appears that the number of instances had a positive effect on the amount of queuing occurring on any one instance.
We also captured a performance counter for Requests Rejected, Request Wait Time and Available Memory (MBytes). We did not see much variance across these counters in our tests. Memory stayed pretty constant as load increased. The Request Rejected counter value was 0 for all of our tests, which is an indication that we did not overload our application in these scenarios.
We could have taken this load test much farther, in duration of the test or increasing users beyond 500. With the increase in Requests Queued and CPU at 500 Users, we were getting close to a point of needing to scale. However, the goal of this exercise is not to prove the breaking point of this sample application, but rather to compare instance sizes under a few different scenarios.
Cerebrata’s Azure Diagnostics Manager allowed us to also visualize performance counters at the role instance level. For example, here we can compare CPU at the role instance level in two of our scenarios.
Figure 8: CPU across 4 Small Instances at 400 Concurrent Users
Figure 9: CPU across 2 Medium Instances at 400 Concurrent Users
From this analysis, we can definitely see how running four small instances yields very similar performance results compared to two medium instances under constant load. Both scenarios yielded similar increases in response time and processor time under heavier load. There may be some places where you could argue that having more small instances may be better than fewer large instances for this simple scenario, while in others the medium instance performed slightly better. For the most part, our assumption that four small instances and two medium instances would yield similar performance results for this test was validated.
Considering Usage and Cost
Where things now become more interesting is when you consider instance sizing around a usage pattern. Most web applications do not have constant load. For this example, we are going to examine things using the following usage pattern. We are suggesting this as just a sample for doing further analysis around scaling and cost.
Figure 9: Sample Daily Load Pattern
Our usage pattern represents the load pattern for a typical day expressed as the number of concurrent users by hour of the day. In our example, we have a peak load of 400 concurrent users towards the middle of the day but have numerous hours where the load is well below 400 concurrent users.
If you consider this typical usage pattern, it is clear that there are hours of the day where either two medium or four small instances is more than enough compute resources to accommodate the load. This is where you would see the benefit of utilizing four small instances over two medium instances. You are required to be running at least two instances of your solution to maintain availability of your application as a single instance can be taken off-line at any time by the platform (see the Windows Azure SLA guidelines). Running more small instances gives you the opportunity to reduce the number of small instances to three or even two for some hours of the day, and essentially create a capacity curve that more closely follows your load curve. Running two medium instances does not allow you to scale down, even though these instances would be underutilized at times.
As of December 2011, the cost of a small role instance ($0.12/hour) is ½ of the cost of a medium role instance ($0.24/hour). In our example, let’s assume we only need two small instances for eight hours of the day, three small instances for 12 hours of the day, and four small instances for four hours of the typical day. This would translate into a daily cost of $8.16/day and is calculated as (2 instances * $.12/hour * 8 hours) + (3 instances * $.12/hour * 12 hours) + (4 instances * $.12/hour * 4 hours). The cost to run two medium instances for 1 day is $11.52 (2 * $.24/hour * 24 hours). While this only represents a daily savings of $3.36, it does represent a 30% savings in daily cost between the two. This savings can become very significant when thinking about larger, more complex solutions.
One of the benefits of the Windows Azure Platform is the concept of “infinite capacity” and the ability to easily provision additional resources to support your needs. Equally as compelling is the ability to easily deprovision those resources when they are no longer needed. As mentioned previously, in this article, the platform provides both horizontal and vertical scaling options. Additionally, you can automate the process by which decisions to scale are made. For example, in our case, you can set up rules that govern, when additional small instances are added and removed, given your site’s traffic patterns. The ability to do this is well described in another Guidance article entitled Dynamically Scaling a Windows Azure Application. This is extremely powerful when you have usage patterns that would warrant this.
We are definitely not saying that a small web role instance size fits every application scenario. There are lots of examples where larger role instance sizes are needed, such as in resource-intensive web applications. For what size is right for you, the answer still, and always will be, “it depends”. But there are benefits to smaller instance sizes in terms of horizontal scalability. The example provided in this article is relatively simple, but it proves our basic premise: determine the smallest instance size that your application can support, and run more of those size instances rather than using larger instances sizes to more than meet your needs. In our case, running small instances sizes gave us more flexibility in terms of being to both scale up and down and ultimately presented us with some cost savings during our non-peak hours. Being able to more precisely adjust capacity to load will allow you to realize meaningful cost savings.