SALES: 1-800-867-1380

Designing Cloud Solutions for Disaster Recovery Using Active Geo-Replication

Updated: February 24, 2015

There are several ways you can take advantage of Active Geo-Replication when designing your service for business continuity. The choice depends on several factors, but the main purpose is to optimize for specific application patterns. Other factors include ease of failover management, service level agreement, and traffic latency and costs. Following are three design options to consider:

This option is best suited for applications with a single active deployment per geographic region. The application requires colocation of the compute tier and the data tier, this may be because of a chatty interface between the business logic and the database. It may also be critical to avoid additional traffic cost between the two tiers. For geo-redundancy, the database is geo-replicated to another region and the compute tier is deployed there also. Connections to the SQL database are pre-configured for each location.

The following figure illustrates the service configuration before the failure. In this case all user traffic is directed to the active compute deployment in the primary location.

Active Passive Compute with Coupled Failover-GEODR




The following figure illustrates the service configuration after a failure. As shown in the figure, the compute deployment and the backend database remain co-located, but are now in the secondary location.

Active Passive Compute with Coupled Failover-GeoDR


This strategy has the following advantages and tradeoffs:

Advantages:

The SQL connection can be statically pre-configured in each region.

Tradeoffs:

  • The entire service (all tiers) is treated as a single failure domain. This means that a failure of any part of the service results in the failover of the entire service.

  • The failover results in some data loss.

SLA calculation:

  • RPO <= 5 seconds

  • RTO = DNS record change + database state change + application verification test

This option is best suited for applications designed for active compute deployments in multiple locations to load balance end user traffic within a region. This type of design is optimal when a predictable end user experience is the priority.

In this design the compute deployments in different locations communicate with the database in a single primary location. The database is geo-replicated between the primary and a secondary location.

The following figure illustrates this service configuration before the failure:

Active Active Compute with DeCoupled Failover




The following figure shows the service configuration after the failure. As shown in Figure 3b, the database failover does not impact the location of the active compute deployments. The underlying assumption in this scenario is that the database backend failure is isolated and the service deployment in the same datacenter is not impacted.

Active Active Compute with DeCoupled Failover

Advantages:

RTO does not include the DNS change, which results in lower downtime compared to the other two options.

Tradeoffs:

  • The traffic between the service and data tier in the primary region after failover has higher latency and additional cost for all service deployments that depend on this database.

  • The failover involves dynamic change of the SQL connection in all service locations.

SLA calculation:

  • RPO <= 5 seconds

  • RTO = SQL connection change + database state change + application verification test

This option is best suited for applications that are extremely sensitive to the data loss and would only consider the database failover as a last resort.

The compute tier is deployed in both primary and secondary locations and co-located with the data tier. The databases are geo-replicated between the same primary and secondary locations. This is similar to option 1, except that it is optimized for compute tier failures as the most common case.

The following figure illustrates this service configuration before the failure. It involves one active and one stand-by deployment of the service in each region.

Active Passive Compute with Decouple Failover




The following figure shows the service configuration after the failure. As mentioned earlier, the underlying assumption in this scenario is that the data tier is likely to be unaffected by the failure so only the service itself needs to failover.

Active Passive Compute with DeCouple Failover

Advantages:

  • RPO is zero in all cases where the data tier is not impacted and remains accessible.

  • RTO is lower as the failover does not involve the database state change.

Tradeoffs:

  • The traffic between the service in the secondary region and the database in the primary region after failover has higher latency and additional charges for data traffic between regions.

  • The failover involves dynamic change of the SQL connection and the DNS change.

SLA calculation:

  • RPO = zero

  • RTO = SQL connection change + DNS record change + application verification test

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2015 Microsoft