Improving Application Availability in Windows Azure
For various reasons, the roles instances that host Windows Azure applications are restarted or taken offline, which affects availability if the application is hosted with one role instance. Sometimes this happens because of a problem with the application itself or because it is a part of normal Windows Azure operations, such as service healing or automatic upgrade to the guest operating system.
The best way to improve availability of your Windows Azure application is to configure your application to use a minimum of two role instances in at least two upgrade domains. By doing this, you ensure better availability for your application by making sure at least one instance remains running in the event that instances are restarted or taken offline as part a normal course of action.
The number of per-role instances in a Windows Azure application is controlled by the Instances setting in the configuration (cscfg) file.
<Role name="<role-name>">
<Instances count="<number-of-instances>" />
<ConfigurationSettings>
<Setting name="<setting-name>" value="<setting-value>" />
</ConfigurationSettings>
</Role>
If you choose to use one role instance for your application, the following table lists the reasons for a role instance to go offline or be restarted and the ways you can mitigate downtime and improve availability of your application:
| Reason a role instance goes offline or is restarted | Way to improve availability |
|---|---|
|
Guest operating system (OS) auto-upgrade: if the application is configured to be upgraded automatically, then the role instance will be automatically restarted when the guest OS upgrade happens (roughly once a month). |
Do one of the following:
|
|
Application upgrade: if you upgrade the application in-place (manual or auto-walk), then since there is a single upgrade domain and instance, Windows Azure will restart the role instance to deploy the new application. |
Do one of the following:
|
|
Modifying the application configuration: When configuration settings are updated and the role instance running the application will restart. |
Do one of the following:
|
|
Adding, deleting or updating a certificate. |
Do one of the following:
|
|
Role status is “Busy” and StatusCheck event handler cause load balancer to take the instance offline. |
Modify your application to not communicate the “Busy” status in the StatusCheck event handler. |
|
Application requests a restart by calling RoleEnvironment.RequestRecycle() |
Modify your application to not request restarts. |
|
Updating the host computer, which causes all VMs on that node to restart. |
Make application start time as fast as possible. |
|
Windows Azure fabric does service healing to the host computer running the VM for the role instance. |
Modify application to be resilient to unexpected restarts. |
|
Application crashes |
Make the application code more robust by utilizing logging and diagnostic tools like WADS, Intellitrace, and RDP. For more information, see Troubleshooting and Debugging in Windows Azure and Troubleshooting Hosted Service Deployment States. |