Azure Web Sites : Architect for the Cloud Using Azure Web Sites
There’s one important thing to keep in mind when designing a cloud solution—always design for failure. Many applications, however, aren’t architected this way. The primary reason for this is a lack of awareness of how to design an architecture using Microsoft Azure Web Sites that’s sufficiently resilient. So how do you build a cloud solution that’s robust enough to handle failure? This article will discuss techniques you can employ to design such a system.
Single Web Site Instance
Azure Web Sites provides hosting plans on several tiers: Free, Shared, Basic and Standard. The Free and Shared tiers provide sites with a shared infrastructure, meaning your sites share resources with other sites. In the Basic and Standard tier, your sites are provided with a dedicated infrastructure, meaning that only the site or sites you choose to associate with your plan will run on those resources.
At these tiers, you can configure your Web hosting plan to use one or more virtual machine (VM) instances. These tiers can support small, medium and large instances. The provider will manage these VMs on your behalf, meaning you’ll never need to worry about OS updates or security and configuration updates.
For running a production-level Web site on Azure Web Sites, the Basic or Standard tier is recommended based on your application size and the amount of traffic to your Web site. Understanding your application requirements is a good starting point:
- What components does your application need—database, e-mail provider, cache and so on?
- What components are at risk of failure and need to be replicated?
- What features do you need—SSL, staging slots and so on?
- How much traffic should your Web site be able to manage?
With these answers, you can create a single Web site and add components such as database and cache to your application as needed. In Figure 1, you can see how you might architect a single Web site and its dependent components.
In the Single Web Site scenario, the biggest caveat is in the event of a service-related outage for any component (Web site, database or cache service), your Web site will be unavailable during the outage, so there will be an impact on your customers and your business.
Figure1 Standard Architecture for a Single Web Site
This design doesn’t consider the risks involved with cloud solutions, nor does it include a way to mitigate them. In a cloud environment, your design goal should be to create a highly available Web site that will minimize downtime and expedite recovery during an outage.
The Goal Is High Availability
A typical Web application stack in Azure Web Sites will consist of a Web application, database, Azure Storage and some form of cache. All of these components are tightly coupled to avoid a single entity or component becoming a single point of failure (SPOF). That’s a key design criterion for architecting any cloud solution. The goal is:
- Avoid SPOF in your design
- Redundancy across each layer in your design
Azure Web Sites provides an SLA of 99.9 percent. This means you can expect a downtime of about 10 minutes per week due to scheduled deployment or upgrades conducted by the Azure Web Sites service. Still, 10 minutes of downtime can have a great impact on your business. Your goal should be to design your solution to mitigate this downtime and be able to serve your customers, hence reducing your risk of impact. You should strive to build a highly available Web architecture, as shown in Figure 2.
Figure 2 Example of a Highly Available Web Architecture
High Availability at the Web Application Layer
Azure Web Sites creates a stateless cloud solution for your Web application. You need to consider the following points when designing for high availability at the application level:
- Use Standard mode for your Web sites, and configure it to use at least two Web site instances.
- Based on your traffic patterns, simulate loads for testing through various tools like Visual Studio and Apache JMeter (jmeter.apache.org) to identify the instance size and how many instances would be needed for your Web site to manage actual traffic levels.
- Ensure user and session data are stored on a centralized system such as a database or a distributed cache layer. In Azure Web Sites, the File server is shared across all VMs when running in Standard mode.
- Replicate the Web application layer in at least two regions supported by Azure Web Sites.
- User content such as media and documents that your Web site uploads or manages should be stored in an Azure Storage account. Make sure the storage account is in the same geographic region as the Web site to reduce latency.
- Always follow secure coding practices to make your application resilient from malicious attacks.
High Availability at the Database Layer
Data is what truly generates value for any application, so you have to manage it efficiently. For your application to function properly, it’s critical to avoid a single point of failure at the database layer. Azure Web site supports the Azure SQL Database and ClearDB MySQL service. Both provide options from low-range to premium solutions.
Azure SQL Database Premium offers more predictable performance and greater capacity for cloud applications using Azure Web sites. It dedicates a fixed amount of reserve capacity for a database including its built-in database replication features. Reserved capacity is ideal for:
- High Peak Load: An application that requires a lot of CPU, memory or I/O to complete its operations is a good candidate for using a Premium database.
- Many Concurrent Requests: Some database applications service many concurrent requests. The normal Web and Business editions of Azure SQL Database have a limit of 180 concurrent requests. Applications requiring more connections should use a Premium database with an appropriate reservation size to handle the maximum number of requests.
- Predictable Latency: Some applications need to guarantee a response from the database in minimal time. If a given stored procedure is called as part of a broader customer operation, there might be a requirement to return from that call in no more than 20 ms 99 percent of the time. This kind of application will benefit from a Premium database to ensure that computing power is available.
ClearDB offers high-availability SQL routers (or CDBRs) that are custom-built, intelligent traffic managers that monitor these database clusters. When a database node is unhealthy, CDBR automatically redirects to your secondary master. This helps ensure uptime for your database and, in turn, the Web application. When building this design, keep in mind there are recommended pairings of regions ClearDB supports for such scenarios, such as:
- If you choose one database in the eastern United States, the paired database should be in the western United States.
- If you choose one database in Northern Europe, the paired database should be in Southern Europe.
High Availability Across Regions
Currently, Azure Web Sites supports multiple regions. It’s working on expanding its infrastructure, as well. Architectures that use multiple regions can be classified into active-active Web sites and active-passive Web sites.
Active-Active Web sites: In an active-active Web architecture, you’ll have multiple Web sites across regions serving the same application. In this case, traffic is managed across all Web sites, hence they’re all considered active.
Active-Passive Web sites: In an active-passive architecture, you’ll have a single Web site that will act as a primary Web site. This will serve up the content for all customer traffic. During a failure on this site, customers will be redirected to another site configured and in sync with the primary Web site in a different datacenter mitigating the failure.
In designing your architecture to operate across multiple regions, there are a few challenges you need to consider:
- Data Synchronization: This refers to the ability to make a real-time copy of the database across the regions. Any complex system will have a database, file server, external storage and cache—just to name a few components. When designing your architecture, you need to ensure data replicated across multiple regions is in sync to keep your Web application from breaking. This may require some changes to your application code to support this scenario. Azure Web Sites supports running a background process called Web Jobs, which lets you build custom tools to manage data synchronization.
- Network Flow: This is the ability to manage network traffic across multiple regions. Azure Traffic Manager lets you load balance incoming traffic across multiple hosted Web sites whether they’re running in the same datacenter or across different datacenters. By effectively managing traffic, you can ensure high performance, availability and resiliency for your applications.
You can use Traffic Manager to ensure high availability and responsiveness for your applications. Traffic Manager provides three load balancing methods:
- Failover: Use this method when you want to use a primary endpoint for all your traffic, but provide one or more backup endpoints in case a primary or backup endpoint becomes unavailable.
- Round Robin: With this method, you can distribute traffic equally across a set of endpoints in the same datacenter or across different datacenters.
- Performance: This method lets requesting clients/users use the closest endpoint to reduce latency by providing endpoints in different geographic locations.
High Availability for Cache Layer
To improve the performance of your Web sites, the caching layer is critical. When handling data with your application, you need to understand how caching may affect both your data and application. When choosing your cache layer, consider the need to maintain consistency of data stored across cache instances. To avoid data inconsistencies, you should use a distributed caching mechanism. A distributed cache may use multiple servers, so it’s scalable and stores application data residing in the database by reducing the number of calls made to the database.
There are many cloud-based caching solutions you can integrate with Azure Web Sites, such as Azure Cache and Memcached Cloud. These are available through the Azure Store in the Azure Management Portal.
Now that you’ve figured out high-availability strategies for your Web sites, you’ll need to evaluate your monitoring options. Azure Web Sites provides two types of monitoring in the Azure Management Portal:
- Monitoring with Web site Metrics: Each Web site dashboard has a Monitor page. This provides performance statistics for each site, such as CPU usage, number of requests, data sent by Web site and so on.
- Endpoint Monitoring: Endpoint monitoring lets you configure Web tests from geo-distributed locations that test response time and uptime for Web URLs. Each configured location runs a test every five minutes. Uptime is monitored with HTTP response codes. Response time is measured in milliseconds. Uptime is considered 100 percent when the response time is less than 30 seconds and the HTTP status code is less than 400. Uptime is 0 percent when the response time is greater than 30 seconds or the HTTP status code is greater than 400. After you configure endpoint monitoring, you can drill down into the individual endpoints to view details of response time and uptime status over the monitoring interval from each of the test locations.
You can perform more detailed and flexible monitoring with the Azure Preview portal. Here, you can build a full-blown DevOps dashboard. Figure 3 shows how a DevOps dashboard looks when running on Azure Web Sites.
Figure 3 A Typical DevOps Dashboard
When developing for the cloud, application planning, development and testing is usually conducted elsewhere, such as Visual Studio Online. Monitoring the health of those applications and troubleshooting problems might be done in yet another portal, such as with Application Insights. Billing is displayed on a separate page. Notice a pattern here?
This new portal brings all of the cloud resources, team members, and lifecycle stages of your application together. It gives you a centralized place to plan, develop, test, provision, deploy, scale and monitor those applications. This approach can help teams embrace a DevOps culture by bringing both development and operations capabilities and perspectives together in a meaningful way. You can learn more about this from the blog post, “Building Your Dream DevOps Dashboard with the New Azure Preview Portal,” at bit.ly/1sYNRtK.
Sometimes bad things happen to good applications. Azure Web Sites can help you automatically recover from application failures. Typically, when your monitoring system detects an issue, it alerts an Ops guy in some fashion. The Ops guy then goes ahead and restarts the Web site to get things up and running again.
You can configure the web.config file in your application to detect and act on situations. For example, if X number of requests take Y amount of time to execute in Z amount of time, then perform action ABC (which could be restart the Web site, log an event or run a custom action). Another example may be if my process takes X amount of memory, then perform action ABC. Learn more about recovery procedures from the Microsoft Azure Blog at bit.ly/LOSEvS.
While geo-redundant architectures provide protection from infrastructure-related failures, features such as automated recovery help build application-layer resiliency. In the world of cloud computing, customers are always keen on recovering first compared to performing full-blown diagnostics while the Web site is down. On the other hand, there are other situations and scenarios where diagnostics are essential.
Diagnostics is similar to peeling away a bad onion. You never know how many layers you need to peel. Azure Web Sites is a fully managed, multi-tenant Platform as a Service (PaaS) offering and as such, does not support remote diagnostics in the VM. This brings a new set of challenges. Here’s a list of available tools and diagnostics scenarios for which they’re useful.
By default, this service provides various kinds of logging that helps aid troubleshooting. Some of the logs include Web Server Logging, Detailed Error Logging, Application Logging, Failed Request Tracing and so on. Learn more about logging services at bit.ly/1i0MSou.
The Kudu console is another diagnostic tool. This our multipurpose software configuration management endpoint customers will frequently use for diagnostics purposes. Upon logging onto the Kudu endpoint, you’ll see different diagnostic options in the top ribbon (see Figure 4).
Figure 4 The Kudu Console Provides Diagnostic Tools
The Environment tab gives you a read-only view of things like System Info, App Settings, Connection Strings, Environment variables, System Paths, HTTP headers, and Server Variables specific to your Web site and VMs. This always works well as an aid to ongoing investigation with different data points.
The Debug Console tab gives you CMD- or Windows PowerShell-based remote execution console and file browser access to your sites. You can use the console to perform most standard console operations and arbitrary external commands, like Git commands, navigate the folder UI, download files and folder, upload files and folder using drag and drop, view and edit text files, and so on. Learn more about this by watching the YouTube video, “The Azure Web Sites Diagnostic Console,” at bit.ly/1h0ZZoR.
The Process explorer tab gives you a list of currently active processes your application is able to see and communicate within the Azure Web Sites sandbox. This view will provide you information about process memory and CPU usage. You can also look at detailed information about a given process, generate and download full memory dumps for offline diagnostics, and so on.
The Diagnostic dump tab will give you a memory dump in a .zip file to download. You can then use the dump analyzer like DebugDiag to quickly analyze the memory dump and get some prescriptive guidance.
The Log stream tab lets you live stream HTTP or application logs directly to the console. This is useful for troubleshooting an active issue. Learn more about this by reading Scott Hanselman’s blog post, “Streaming Diagnostics Trace Logging from the Azure Command Line (plus Glimpse!),” at bit.ly/1jXwy7q.
This isn’t a comprehensive list of debugging tools by any means. Microsoft plans on building more sophisticated diagnostics, where some of the most commonly used scenarios will require only a single click. Until then, enjoy building good applications that are resistant to failure.
Apurva Joshi is a senior program manager with Microsoft, working on the Azure Web Sites team. He joined Microsoft in 2002 and has worked on various Web technologies like IIS and ASP.NET.
Sunitha Muthukrishna is a program manager at Microsoft, working on the Azure Web Sites team. She joined Microsoft in 2011 and has been working on Windows Web Application Gallery and Azure Web Sites. Previously, she contributed to IT projects for a couple of non-profit organizations such as the Community Empowerment network and the Institute of Systems Biology, both located in Seattle, Wash.
Thanks to the following technical experts for reviewing this article: Microsoft Azure Production Management Team