Export (0) Print
Expand All
27 out of 53 rated this helpful - Rate this topic

High Availability and Disaster Recovery for SQL Server in Azure Virtual Machines

Updated: August 8, 2013

Azure virtual machines (VMs) with SQL Server can help database administrators to lower the cost of a high availability and disaster recovery (HADR) database system. Most HADR solutions available in SQL Server are supported in Azure virtual machines, both as cloud solutions and as hybrid-IT solutions. In a cloud database solution, the entire HADR system runs in Azure, and client applications can connect to it from within Azure or from the internet, such as from an on-premise network. In a hybrid-IT database solution, part of the HADR system runs in Azure and part of the system runs on-premise in your organization. The flexibility of the Azure environment enables you to move partially or completely to the cloud to satisfy the budget and HADR requirements of your SQL Server database systems.

Need for SQL Server HADR Solution in Azure Virtual Machines

When running SQL Server in Azure VMs, it is up to you as the database administrator to ensure that your database system possesses the HADR capabilities that the service-level agreement requires. The fact that Azure provides high availability mechanisms, such as service healing for cloud services and failure recovery detection for the Virtual Machines, does not itself guarantee the desired HADR capabilities of SQL Server. These mechanisms protect the high availability of the VMs and not SQL Server, which runs on the VMs. It is possible for the SQL Server instance to fail while the VM is online and healthy. Moreover, even the high availability mechanisms provided by Azure allow for occasional and possibly lengthy downtime of the VMs due to events such as recovery from software or hardware failures or operating system upgrades.

In addition, Geo Redundant Storage (GRS, also known as geo-replication) in Azure may not be an adequate disaster recovery solution for your databases. With geo-replication, you do not have control over the recovery time and data loss of the Azure disks after a disk failure. In other words, you cannot control the RTO and RPO in a disaster as you can with one of the SQL Server disaster recovery solutions. More information regarding geo-replication limitations are covered in the Geo-replication not supported for data and log files on separate disks section.

HADR Deployment Architectures

HADR Technologies in the SQL Server product that are supported in Azure include:

It is possible to combine the technologies together to implement a SQL Server solution that has both high availability and disaster recovery capabilities, whether in a cloud-only or hybrid IT environment. Depending on the technology you use, a hybrid IT deployment may require a VPN tunnel with the Azure virtual network. The sections below show you some of the example deployment architectures.

Azure-Only: High Availability Solutions

You can have a high availability solution for your SQL Server databases in Azure using AlwaysOn Availability Groups or database mirroring.

 

Technology Example Architectures

AlwaysOn Availability Groups

All availability replicas running in Azure VMs for high availability within the same Azure datacenter. You need to configure a domain controller in addition to the SQL Server virtual machines because Windows Server Failover Clustering (WSFC) requires an Active Directory domain.

AlwaysOn Availability Groups in Windows Azure

For more information, see Tutorial: AlwaysOn Availability Groups in Azure (GUI).

Database Mirroring

Principal, mirror, and witness servers all running in the same Azure datacenter for high availability. You can deploy using a domain controller.

Database Mirroring in Windows Azure

You can also deploy the same database mirroring configuration without a domain controller by using server certificates instead.

Database Mirroring with Cert in Windows Azure

For more information, see Tutorial: Database Mirroring for High Availability in Azure.

Azure-Only: Disaster Recovery Solutions

You can have a disaster recovery solution for your SQL Server databases in Azure using database mirroring or backup and restore with storage blobs.

You cannot have an Azure-only availability group across Azure datacenters because cross-datacenter virtual networks are not currently supported in Azure, and an active directory domain cannot span multiple Azure datacenters.

 

Technology Example Architectures

Database Mirroring

Principal and mirrorand servers running in different Azure datacenters for disaster recovery. You must deploy using server certificates because an active directory domain cannot span multiple Azure datacenters.

Database Mirroring (DR) in Windows Azure

For more information, see Tutorial: Database Mirroring for Disaster Recovery in Azure.

Backup and Restore with Azure Blob Storage Service

Production databases backed up directly to blob storage in a different Azure datacenter for disaster recovery.

Backup to Blog in Windows Azure

For more information, see Backup and Restore for SQL Server in Azure Virtual Machines.

Hybrid IT: Disaster Recovery Solutions

You can have a disaster recovery solution for your SQL Server databases in a hybrid-IT environment using AlwaysOn Availability Groups, database mirroring, log shipping, and backup and restore with Azure blog storage.

 

Technology Example Architectures

AlwaysOn Availability Groups

Some availability replicas running in Azure VMs and other replicas running on-premise for cross-site disaster recovery. The production site can be either on-premise or in an Azure datacenter.

AlwaysOn Availability Groups in Hybrid IT

Because all availability replicas must be in the same WSFC cluster, the WSFC cluster must span both networks (a multi-subnet WSFC cluster). This configuration requires a VPN connection between Azure and the on-premise network.

For successful disaster recovery of your databases, you should also install a replica domain controller at the disaster recovery site.

For more information, see Tutorial: AlwaysOn Availability Groups in Hybrid IT (PowerShell).

Database Mirroring

  • One partner running in an Azure VM and the other running on-premise for cross-site disaster recovery using server certificates. Partners do not need to be in the same Active Directory domain, and no VPN connection is required.

    Database Mirroring in Hybrid IT

    For more information, see Tutorial: Database Mirroring for Disaster Recovery in Hybrid IT.

  • One partner running in an Azure VM and the other running on-premise in the same Active Directory domain for cross-site disaster recovery. A VPN connection between the Azure virtual network and the on-premise network is required.

    For successful disaster recovery of your databases, you should also install a replica domain controller at the disaster recovery site.

Log Shipping

One server running in an Azure VM and the other running on-premise for cross-site disaster recovery. Log shipping depends on Windows file sharing, so a VPN connection between the Azure virtual network and the on-premise network is required.

Log Shipping in Hybrid IT

For successful disaster recovery of your databases, you should also install a replica domain controller at the disaster recovery site.

For more information, see Tutorial: Log Shipping for Disaster Recovery in Hybrid IT.

Backup and Restore with Azure Blob Storage Service

On-premise production databases backed up directly to Azure blob storage for disaster recovery.

Backup to Blog in Hybrid IT

For more information, see Backup and Restore for SQL Server in Azure Virtual Machines.

Important Considerations for SQL Server HADR in Azure

Azure VM, storage, and networking, have different operational characteristics than an on-premise, non-virtualized IT infrastructure. A successful implementation of a HADR SQL Server solution in Azure requires that you understand these differences and design your solution to accommodate them.

High availability nodes in an availability set

When you implement a high availability solution in Azure, the availability set in Azure enables you to place the high availability nodes into separate fault domains and upgrade domains. To be clear, the availability set is an Azure concept. For more information, see Manage the Availability of Virtual Machines. It is a best practice that you should follow to make sure that your databases are indeed highly available, whether you are using AlwaysOn Availability Groups, database mirroring, or otherwise. If you do not follow this best practice, you may be under the false assumption that your system is highly available, but in reality your nodes can all fail simultaneously because they happen to be placed in the same fault domain in the Azure datacenter. This recommendation is not as applicable with log shipping since as a disaster recovery feature, you should ensure that the servers are running in separate Azure datacenter locations. These datacenter locations are by definition separate fault domains.

For Azure VMs to be placed in the same availability set, you must deploy them in the same cloud service. Only nodes in the same cloud service can participate in the same availability set.

WSFC cluster behavior in Azure networking

The non-RFC-compliant DHCP service in Azure can cause the creation of certain WSFC cluster configurations to fail, due to the cluster network name being assigned a duplicate IP address (the same IP address as one of the cluster nodes). This is an issue when you implement AlwaysOn Availability Groups, which depends on the WSFC feature.

Consider the scenario when a two-node cluster is created and brought online:

  1. The cluster comes online, then NODE1 requests a dynamically assigned IP address for the cluster network name.

  2. No IP address other than NODE1’s own IP address is given by the DHCP service, since the DHCP service recognizes that the request comes from NODE1 itself.

  3. Windows detects that a duplicate address is assigned both to NODE1 and to the cluster network name, and the default cluster group fails to come online.

  4. The default cluster group moves to NODE2, which treats NODE1’s IP address as the cluster IP address and brings the default cluster group online.

  5. When NODE2 attempts to establish connectivity with NODE1, packets directed at NODE1 never leave NODE2 because it resolves NODE1’s IP address to itself. NODE2 cannot establish connectivity with NODE1, then loses quorum and shuts down the cluster.

  6. In the meantime, NODE1 can send packets to NODE2, but NODE2 cannot reply. NODE1 loses quorum and shuts down the cluster.

Different WSFC cluster configurations exhibit different behaviors based on the way a specific configuration interact with the Azure DHCP service and network. You can work around the behaviors by assigning an unused static IP address, such as a link-local IP address like 169.254.1.1, to the cluster network name in order to bring the cluster network name online. Afterward, the cluster network name can be deleted because it is not used by the availability groups. To simplify this process, see Configuring Windows Failover Cluster in Azure for AlwaysOn Availability Groups.

The following tutorials for AlwaysOn Availability Groups in Azure demonstrate how to configure an availability group end-to-end in different scenarios.

Availability group listener support

Availability group listeners are supported on Azure VMs running Windows Server 2012. This support is made possible by the use of load-balanced endpoints with direct server return (DSR) enabled on the Azure VMs that are availability group nodes. You must follow special configuration steps for the listeners in Azure to work for both client applications that are running in Azure as well as those running on-premise in hybrid IT. For instructions on setting up a listener for a cloud-only availability group deployment, see Tutorial: Listener Configuration for AlwaysOn Availability Groups in Azure. Azure

AzureAvailability group listeners are not yet supported for Windows Server 2008 R2. However, you can still connect to each availability replica separately by connecting directly to the service instance. Also, since AlwaysOn Availability Groups are backward compatible with database mirroring clients, you can connect to the availability replicas like database mirroring partners as long as the replicas are configured similar to database mirroring:

  • One primary replica and one secondary replica

  • The secondary replica is configured as non-readable (Readable Secondary option set to No)

An example client connection string that corresponds to this database mirroring-like configuration using ADO.NET or SQL Server Native Client is below:

Data Source=ReplicaServer1;Failover Partner=ReplicaServer2;Initial Catalog=AvailabilityDatabase;

For more information on client connectivity, see:

Network Latency in Hybrid IT

You should expect high network latency between your on-premise network and Azure and deploy your HADR solution accordingly. When deploying AlwaysOn Availability Groups, you should use asynchronous commit instead of synchronous commit for the cross-premise synchronization mode. When deploying database mirroring servers both on-premise and in Azure, use the high performance mode instead of the high safety mode.

Virtual disks with no read or write caching

A general recommendation for running SQL Server in Azure VMs is not to use the operating system disk (C) or the temporary disk (D). The former has a limit of 127 GB and is configured with both read and write caching by default, which can negatively impact server performance. The latter holds data that do not persist beyond a VM reboot. Therefore, the general recommendation is to attach a separate data disk with no read or write caching and use it for database files. Attaching separate disks also enable you to customize disk capacity up to 1 terabyte.

If you need to store database files on the operating system disk, it is recommended that you disable write caching when you provision the VM. You cannot disable read caching on the operating system disk.

Geo-replication not supported for data and log files on separate disks

Geo-replication in Azure disks does not support the data file and log file of the same database to be stored on separate disks. GRS replicates changes on each disk independently and asynchronously. This mechanism guarantees the write order within a single disk on the geo-replicated copy, but not across geo-replicated copies of multiple disks. If you configure a database to store its data file and its log file on separate disks, the recovered disks after a disaster may contain a more up-to-date copy of the data file than the log file, which breaks the write-ahead log in SQL Server and the ACID properties of transactions. If you do not have the option to disable geo-replication on the storage account, you should keep all data and log files for a given database on the same disk. If you must use more than one disk due to the size of the database, you need to deploy one of the disaster recovery solutions listed above to ensure data redundancy.

See Also

Did you find this helpful?
(1500 characters remaining)
Thank you for your feedback

Community Additions

ADD
Show:
© 2014 Microsoft. All rights reserved.