Reducing TechNet and MSDN hosting costs and environmental impact with autoscaling on Microsoft Azure
This content and the technology described is outdated and is no longer being maintained. For more information, see Transient Fault Handling.
In June 2011 the Enabling Platform Experience (EPX) group, part of Microsoft's Developer Division, began a journey that would see two of its largest developer and IT professional websites migrated from an entirely on-premises infrastructure to take advantage of the reliability, scalability, and availability of the cloud. Part of the solution included using the Autoscaling Application Block (which you may have heard referred to as Wasabi) from the Microsoft patterns & practices group to automatically scale the websites based on current levels of traffic. The goal was to move both the Microsoft Developer Network (MSDN) and Microsoft TechNet to Microsoft Azure, and achieve a number of significant benefits:
- To maximize resources utilization. The requirement to meet highly variable traffic patterns meant that overall server utilization in the original on-premises solution was often as low as 20% overall. However, this over provisioning was necessary to meet demand during busy periods. A cloud-hosted solution can provide elasticity through the easy addition and removal of servers to meet demand, thereby significantly reducing this need to over provision resources. The initial target was to achieve server utilization levels of 67% in the cloud.
- To reduce infrastructure and running costs. Every organization is looking for ways to reduce energy usage and cost, and to minimize investment in infrastructure. The use of hosted virtual servers can minimize initial and ongoing hardware, infrastructure, and maintenance costs; as well as achieving significant savings in day-to-day running costs.
- To "green" the MSDN and TechNet services. A significant environmental (and often regulatory) focus for all companies today is to minimize their carbon footprint. Hosted solutions that support dynamic resource scaling can help developers achieve a significant reduction in energy usage associated with the operation of their application and help companies to reduce their direct and indirect carbon emissions.
- To act as a learning exercise. Microsoft will move many of its web sites and applications to Azure over time. The knowledge and experience gained during the initial phase of migrating Microsoft TechNet will be invaluable for the migration of other applications.
According to the IT Energy Efficiency Imperative, the biggest potential for improving the IT energy efficiency in a data center lies with increasing server utilization. This is because idle servers continue to consume typically between 30% and 60% of the power they consume when fully loaded.
"Most applications are provisioned with far more IT resources than they need, as a buffer to ensure acceptable performance and to protect against hardware failure. Most often, the actual needs of the application are simply never measured, analyzed, or reviewed," says Mark Aggar, Senior Director, Environmental Sustainability at Microsoft.
The team working on the migration also had some requirements that the migration process must meet:
- No code or architecture changes. The migration should be accomplished with minimum changes to configuration, and with no changes at all to the architecture or code of the applications. By eliminating the need to re-architect the application to run on the cloud, this approach would enable a much faster and less expensive migration.
- Equivalent or better performance. Performance of the migrated application must be equivalent to, or better than the on-premises solution. In particular, it must be able to scale dynamically to meet demand, while minimizing running costs.
- Ease of operation. The migrated application must be easy to operate, monitor, manage, and maintain.
- Reduced on-premises requirements. The result must provide opportunities to minimize on-premises infrastructure requirements and costs.
This case study describes how the Autoscaling Application Block contributed to achieving these benefits and to meeting these requirements.
The MSDN and TechNet websites
The Enabling Platform Experience (EPX) group at Microsoft is responsible for managing a number of Microsoft Developer online and offline experiences, including the Microsoft Developer Network (MSDN) and Microsoft TechNet. Usage of both of these sites is highly variable; with significant spikes when new products launch, or during training or conferences.
The MSDN and TechNet websites have a similar architecture and are hosted on the same hardware, although TechNet receives less traffic overall than MSDN. Therefore, the decision was made to perform the migration for TechNet first; and then apply the lessons learned to the MSDN website. The midterm target for EPX is to move all applicable websites and applications to Azure by the end of 2014.
The result of the migration process from the original on-premises solution is a hybrid cloud solution: the web front-ends are now hosted in Windows worker role instances; the data tier, that is comprised of multi-terabyte SQL Server databases remains on-premises. The team use the Autoscaling Application Block enable elasticity in the number of worker role instances running in Azure.
For more information about the migration from a fully on-premises to a hybrid cloud solution, see Migrating Microsoft TechNet and MSDN to Hybrid Cloud using Azure on TechNet.
The Role of the Autoscaling Application Block
"Although Azure enables elasticity, without the Autoscaling Application Block there would have been only three options for dynamically scaling the MSDN and TechNet websites: manually adding and removing role instances, handing the Management API key to a third-party scaling service, or writing the scaling infrastructure from scratch," says Dr. Grigori Melnik, Sr. Program Manager, Microsoft patterns & practices group.
The Autoscaling Application Block is designed to enable automatic scaling behavior in Azure applications and to make it easy to benefit from the elastic supply of resources in Azure: in this case the worker role instances hosting the MSDN and TechNet websites. The Autoscaling Application Block can scale Azure applications either in or out based on two different sets of rules: constraint rules that set limits on the maximum and minimum number of role instances, and reactive rules that dynamically add or remove role instances based on metrics that the Autoscaling Application Block collects from the running role instances. Through these configurable rules, the block can balance specific application performance requirements against running costs by dynamically controlling the number of running role instances.
Rules may be modified while a Azure application is running in order to fine-tune the autoscaling behavior and make adjustments in response to changes access patterns. The Autoscaling Application Block generates comprehensive diagnostic information that helps with the analysis its behavior and with further optimization of the rule set.
"Applications that are designed to dynamically grow and shrink their resource use in response to actual and anticipated demand are not only less expensive to operate, but are significantly more efficient with their use of IT resources than traditional applications," says Mark Aggar, Senior Director, Environmental Sustainability at Microsoft.
Adding the Autoscaling Application Block to the TechNet and MSDN websites
To add the Autoscaling Application Block to the TechNet and MSDN websites, the team completed the following three steps as part of the migration process:
- They configured the Azure worker roles to write CPU Usage performance counter data to the Azure log files and to persist those log files to Azure storage where the Autoscaling Application Block can then access the performance counter data. For information about this configuration task, see How to Use the Azure Diagnostics Configuration File on MSDN.
- They created a Azure worker role to host the Autoscaling Application Block. This enables the Autoscaling Application Block to run in Azure from where it can access the performance counter data that the reactive rules use and use the Azure Management API to dynamically add or remove Azure worker role instances that host the MSDN and TechNet sites.
- They added a Azure web role that hosts a monitoring console that enables the team to see, in real-time, a graphical representation of the activities of the Autoscaling Application Block. This monitoring tool is based on sample code that ships with the Autoscaling Application Block.
- They also updated the mechanism that the Autoscaling Application Block uses to send notifications.
The original on-premises MSDN and TechNet web applications were designed to be "web-farm friendly" being stateless and fault-tolerant: because the web applications did not rely on node-affinity, each user request could be handled by any of the servers. When these applications are running in Azure worker roles, and the Autoscaling Application Block rules cause the number of running instances to be scaled back, this characteristic of the design means that the scale-down operation can take place safely. The instance of IIS running in the worker role that is stopping goes through its standard shut-down sequence and completes any active requests, and the Azure load-balancer routes new requests to IIS instances running in other worker role instances. Because any active state is stored on the client, the scale-down event is not noticed by the end users of the site.
"Our experience implementing Wasabi has been remarkable, after a quick learning curve (mainly by playing with the demo) we were able to implement and deploy to production within hours," says Hugo Salcedo, Premier Field Engineer, Microsoft Services.
These steps did not require any changes to the code or architecture of the original applications, and adding the Autoscaling Application Block took the team just a few hours.
"The monitoring console, based on the sample code shipped with Wasabi, gives us a better view of what's going on CPU-wise with our Azure role instances than just about any other Azure tool out there right now," says Jay Jensen, Sr. Systems Engineer, EPX group at Microsoft.
The Autoscaling Application Block rules and their effects
Prior to the migration, the team analyzed their historical traffic data to try to identify any usage patterns that would enable them to pre-emptivley scale the websites in anticipation of bursts in traffic by using constraint rules. However, the primary cause of bursts on these sites is crawlers from search engines and it proved impossible to predict when these crawlers would be active. The team will continue to monitor traffic and analyze it for patterns in case the activities of the search engines become more predictable. The current constraint rules set minimum and maximum role instance counts for the worker roles. For example, the minimum instance count for the primary MSDN worker role is 15, and the maximum instance count is 50. The values are chosen to ensure that performance requirements are met, resource utilization is kept high, and that there is a cap on the potential running costs of the systems. For some of the other worker roles, the minimum instance count is set to two: this helps to ensure the reliability of the system and is necessary if the Azure SLAs are to apply to these roles.
The reactive rules are grouped: one or more rules scale a worker role out when average CPU usage across the running role instances exceeds a certain threshold; the other rules scale the worker role in when average CPU usage across the running role instances falls below a certain threshold. An example of such a group of rules for the primary MSDN worker role is shown below:
Scale up rule
<rule enabled="true" rank="3" name="MSDN_Scale_Up_On_High_CPU"> <actions> <scale target="MSDN_Service" by="5"/> </actions> <when> <greaterOrEqual than="73" operand="MSDN_CPU"/> </when> </rule>
Scale down rules
<rule enabled="true" rank="3" name="MSDN_Scale_Down_CPU_Below_60%"> <actions> <scale target="MSDN_Service" by="-1"/> </actions> <when> <lessOrEqual than="60" operand="MSDN_CPU"/> </when> </rule>
Other scale down rules reduce the number of instances by two if the average CPU utilization falls below 55%, by three if the average CPU utilization falls below 50%, and by five if the average CPU utilization falls below 45%.
These rules show how the team aims to keep average CPU utilization at between 60% and 73%. This is well above the average CPU utilization of 20% that was recorded for the servers that hosted the websites on-premises. This where many of the benefits of the Autoscaling Application Block are realized:
- The Autoscaling Application Block has helped the team to achieve significantly higher resource utilization: the reactive rules ensure that average CPU usage across all the running worker role instances says between 60% and 73%. This may be improved further as the team experiments with raising the thresholds. The team is doing this gradually in order to verify the stability of the system: as the team gains confidence in the system, they plan to reduce the amount of buffer capacity in running role instance and push utilization rates higher. Based on the initial analysis performed by the team, the level of server utilization has the biggest impact on the ROI for the project.
- According to the paper "Cloud Computing and Sustainability: The Environmental Benefits of Moving to the Cloud," higher server utilization is a key indicator of energy savings. Although servers running at utilization rates consume more power, the resulting increase in power consumption is more than offset by the relative performance gains. For example, increasing a utilization rate from 5% to 20% means the server can handle four times the load, but with an increase in power consumption of only 10% to 20%. This will lead to reduced infrastructure and running costs and help to reduce to carbon footprint of the MSDN and TechNet websites.
- The Autoscaling Application Block has added additional resilience to the system. The block can automatically add additional cloud resources in response to problems in the on-premises data centers as the load balancers route traffic to the worker roles in Azure instead of to on-premises servers.
Figure 1 shows an example of how the reactive rules managed the instance count of the MSDN service role type during a single day during June 2012:
Table 1 shows the average instance counts of the MSDN service role type that was managed by the Autoscaling Application Block for eight days in June 2012:
Average role instance count with the Autoscaling Application Block
Average role instance count without the Autoscaling Application Block*
* This is the average number of instances that would have been used if the Autoscaling Application Block was not installed based on having 25 instances provisioned around the clock.
Figure 2 shows how the number of instances varied over a five day period, during which there was a large amount of activity from a web crawler. It shows how the reactive rules adjusted the number of instances to keep the average CPU utilization at around 60%.
The see-saw pattern in the instance count graph is due to the behavior of the web crawler. The Autoscaling Application Block is responding to the changes in the load from the crawler.
This initial implementation of autoscaling for the worker roles hosting the web front-ends of the TechNet and MSDN websites provided EPX and other teams at Microsoft with many pointers that will be useful for future implementations. These lessons include:
- Make your web application web-farm friendly: If a web application is already web-farm friendly it won't require any changes to support scale-down operations from the Autoscaling Application Block.
- Monitor traffic: It is important to monitor website traffic to be able to analyze it and identify any usage patterns. Constraint rules can pre-emptively add or remove instances based on usage patterns. The team looked at historical data over a month and were unable to identify any clear, repeating patterns: therefore, they are relying on reactive rules to handle changes in traffic volumes.
- Fine-tune the rules: Revist the rules regularly to ensure that they are performing optimally. Usage patterns and volumes on websites change over time and we want our rules to reflect these changes.
- Use email notifications: When we initially used the Autoscaling Application Block, we configured it to send email messages describing the changes it would make, but without actually making the change. This enables us to verify that the Autoscaling Application Block was behaving as expected.
- Leave some spare capacity: Because of the complexity of the system, the team found that it can take some time for Azure to spin up new role. For the MSDN and TechNet sites, this can be as long as 20 to 40 minutes because these applications require some custom build steps that add to the time Azure typically takes to create a new instance. Starting the scale out when CPU usage reaches 73% means that there is some headroom available to handle the increased workload before the new instances come online. The team expects the startup time for new role instances to improve as they make changes to the system, and when they see these improvements they plan to adjust the scale out threshold upwards.
"We chose 73% CPU usage as the scale out trigger because if we were in the middle of a rising CPU spike, we didn't want to wait until we were at, say, 80% CPU usage to kick-off a scale-out that might take 20 to 40 minutes to complete," says Jay Jensen, Sr. Systems Engineer, EPX group at Microsoft.
At the time this case study was written, the EPX development team, were still experimenting with the autoscaling rules to determine optimum values for the different worker roles. This will be an ongoing task, but the changes to the rules are likely to become smaller over time as they are fine-tuned.
The Autoscaling Application Block is already actively scaling the MSDN and TechNet websites, ensuring that there is adequate capacity to meet varying levels of demand around the clock, and making sure that spare capacity is released as soon as demand falls. In doing this, the Autoscaling Application Block is managing the trade-off between the need for capacity to meet peak demand, and the goals of increasing server utilization levels and reducing running costs and power consumption.
"After getting TechNet and MSDN into Azure, it was the icing on the cake to be able to implement this autoscaling behavior," says Jay Jensen, Sr. Systems Engineer, EPX group at Microsoft.
While raising the minimum server utilization from 20% to 60% for the MSDN and TechNet websites is an impressive achievement, it’s worth noting that this is only possible because of the fundamental nature of cloud computing. Unlike a dedicated server infrastructure where capacity is sized for peak demand plus a substantial buffer, a cloud infrastructure enables the utilization of any given component to be driven higher because of the simple fact that more capacity is available at relatively short notice should demand increase.
Running at such high average utilization levels is simply not possible with dedicated infrastructure unless a large part of the infrastructure is turned off during periods of lower demand. In practice this is almost never done due to a combination of poor incentives and the perceived technical risk of power cycling servers (which it should be noted is no longer a significant risk when using modern hardware and virtualization).
However, because more resources are always available in the cloud at any given moment, driving higher levels of utilization with autoscaling has finally become viable and economically advantageous. Developers who design their applications to autoscale are truly taking advantage of the potential of cloud computing.