Business Continuity in SQL Azure
This article describes the business continuity capabilities provided by SQL Azure. Business continuity issues can belong to one of the following main categories:
-
Failure of individual servers, devices or network connectivity
-
Corruption, unwanted modification or deletion of data
-
Widespread loss of data center facilities
The purpose of creating database backups is to enable you to recover from data loss caused by such issues. Backing up and restoring data is different in SQL Azure than an on-premises SQL Server and must work with the available resources and tools. The following sections explain how SQL Azure addresses these categories to help provide business continuity:
-
How to help protect your database from failure of individual servers and devices
-
How to help protect your database from unwanted deletions or modifications
-
How to help protect your database from widespread loss of data center facilities
How to help protect your database from failure of individual servers and devices
By storing your data in SQL Azure you are taking advantage of many fault tolerance and secure infrastructure capabilities that you would otherwise have to design, acquire, implement and manage. This section covers the things SQL Azure does for you without any additional expense.
World-Class Data Centers in Multiple Geographic Locations
A basic requirement for business continuity is the availability of a well-managed data center infrastructure in different geographic locations. If individual data centers are not properly managed, the most robust application designs may become unsuccessful and not effective. With SQL Azure, customers take advantage of the extensive experience Microsoft has in designing, managing and securing world-class data centers in diverse locations around the world. These are the same data centers that run many of the world’s largest online services.
Infrastructure Redundancy
SQL Azure mitigates outages due to failures of individual devices, such as hard drives, network interface adapters, or even entire servers. Data durability and fault tolerance is enhanced by maintaining multiple copies of all data in different physical nodes located across fully independent physical sub-systems such as server racks and network routers. At any one time, SQL Azure keeps three replicas of data running – one primary replica and two secondary replicas. SQL Azure uses a quorum based commit scheme where data is written to the primary and one secondary replica before we consider the transaction committed. If the hardware fails on the primary replica, the SQL Azure detects the failure and fails over to the secondary replica. In case of a physical loss of the replica, SQL Azure creates a new replica automatically. Therefore, there are at least two physical transactionally consistent copies of your data in the data center. The following diagram illustrates how SQL Azure keeps three replicas in the physical server racks in the data center.
In addition to the redundant replicas, SQL Azure maintains internal copies of your data for the last 14 days for all your databases within the data center. These copies provide a safe guard against simultaneous or catastrophic hardware and system failures but are not available to customers. We recommend that you implement your own backup and restore solutions as described in the following sections.
Important Notes
-
The goal is that failures within the data center will not result in data loss, but they will result in temporary disconnects and transaction failures.
-
Your application should be resilient to the temporary disconnects. We recommend that you implement the retry logic in your application to prevent connection losses. For more information on how to handle the connection-loss errors, see Connection Management in SQL Azure article in the TechNet Wiki.
-
We do not provide an SLA for the typical recovery point objective (RPO) and the recovery time objective (RTO) currently. For more information on Windows Azure Service Level Agreements, see Service Level Agreements.
How to help protect your database from unwanted deletions or modifications
Users or applications can make unwanted changes to data. This may need a revert operation. For example, a user might modify some data that belongs to the wrong customer, and so on. The ability to restore application data in the event of a corruption or unwanted modification or deletion is a fundamental requirement for software applications. We recommend that you implement the following approaches for implementing backups for SQL Azure databases.
Database Copies
To make a separate copy of your SQL Azure database, run a Transact-SQL command to create a database copy on a periodic basis and then manage these database copies. Execute the Transact-SQL CREATE DATABASE statement with the AS COPY OF clause to create an independent copy of your database in SQL Azure. Note that a copy operation may take a long period of time, potentially hours, depending on the workload within the data center, how much workload is currently being executed against the original database being copied, and the size of the original database being copied.
The following code example demonstrates how to execute a database copy by using the Transact-SQLCREATE DATABASE (SQL Azure Database) statement:
CREATE DATABASE destination_database_name AS COPY OF [source_server_name].source_database_name
You can create a copy of your database in a different SQL Azure server but the server must be in the same data center. A copy operation might take a long time but the final copy of your destination database will be transactionally consistent with the original database when the copy process completes. For example, if you start a copy process at 14:00 in the 24-hour time notation, SQL Azure will immediately create the new copy based on the system backup taken nearest 14:00. Note that this wouldn’t be longer than 5 minutes prior to 14:00. SQL Azure will replicate transactions committed to the source database to the destination database until the destination database catches up to the source database. Once the destination database catches up, the copy operation is considered complete and the replication link between the source and the destination is severed leaving the new destination database isolated from any future changes to the source database. At this point, the new destination database becomes accessible. As an example, let’s assume that a database copy process takes several hours to complete. The database copy process will mirror the source database at the point in time when the copy operation completes. Therefore, when thinking about the restore as a mechanism to recover from any user error at a point in time, you must know that if the user error occurs while the copy operation is in progress, the error gets replicated to the destination database as well.
Besides enabling a simpler recovery process, such as maintaining the original connection string, copying the database to a different server doesn’t provide you with any additional disaster protection. The SQL Azure server is a logical server and while the source database and the destination database may be grouped under the same logical server, the two databases aren’t necessarily physically located on the same machine. Furthermore, when you consider the two secondary replicas for both the source and the destination databases, the likelihood of physical co-location of the 6 different copies becomes remote. In summary, copying to a separate logical server does not provide you any stronger protection.
After the database copy begins, you can query the sys.databases (SQL Azure Database) and sys.dm_database_copies (SQL Azure Database) views on the master database of the destination server to retrieve more information about the copying progress.
SELECT [databases].[name], [copies].* FROM sys.dm_database_copies copies JOIN sys.databases databases ON copies.database_id = databases.database_id
The frequency at which you choose to copy your database can vary and depends on business needs. To recover from user or application errors, we recommend that you create a daily copy and maintain two or three running copies on a rotating basis by dropping the oldest copy every day after a fresh copy completes.
Note that although we recommend daily copies, you can copy your database more frequently. We recommend that you perform a database copy operation once per hour maximum because each database copy process can finish at different times but each is transactionally consistent with the source database at the point in time when the copy completes. If you start two database copies 5 minutes apart, they may both finish almost at the same time. This creates an identical database copy and you would be billed for the same or similar copy of the data twice.
For more information on the database copy, see Copying Databases in SQL Azure, How to: Back Up Your Database (SQL Azure), and How to: Copy Your Database to a Different Server (SQL Azure).
Database Copy Restore
You can implement restore for your database and its copy on the same server by renaming them. For example, consider a database named “Database1” and a copy “Database1_copy_02_01_2012”. The following Transact-SQL script demonstrates how to swap the database names in the same server when a source database and its copy are called as “Database1” and “Database1_copy_02_01_2012” respectively. After the script runs successfully, new database traffic is directed to the copy rather than to the original database.
ALTER DATABASE Database1 MODIFY NAME = Database1_OLD GO WAITFOR DELAY '00:00:30' GO ALTER DATABASE Database1_copy_02_01_2012 MODIFY NAME = Database1 GO
While the service runs the rename process, the existing connections to the original database are killed. Therefore, it’s important to ensure that your application is resilient to connection-losses to SQL Azure and will be able to re-establish the connection when such an event occurs. The Transact-SQL script includes WAITFOR DELAY statement between the two ALTER statements. This helps ensure that the new connections established during the recovery period do not accidentally connect to the old database before renaming is done. If you need to copy your database to a different SQL Azure server, renaming will not work. In such cases, modify your application to point to the database copy on the separate server.
Important Notes
-
Database copies might take a long time and are variable in duration.
-
Your bill will include charges for each database copy. But you are charged for the new database copy only after the copy process completes successfully. Each database copy is billed at the same rate as the source database.
-
You are responsible for managing the copies and dropping them when appropriate.
-
Database usage is calculated daily, so if your database copy is active only for one day, you will be charged a prorated fee for one day for your copy database.
-
Restoring to a different logical server requires modifications in connection strings.
-
Restoring to a different logical server doesn’t guarantee a better data-loss protection and improved system performance.
-
While a copy process is in progress, all changes to the original database are replicated to the copy. Therefore, deciding which database copy to restore is an important issue.
-
The SQL Azure Database November 2011 release introduced Federations in SQL Azure (SQL Azure Database). At this point you cannot copy a database that contains federations by using the database copy operation. Conversely, creating a federation fails if a database copy operation is active in the database. Database copy cannot be performed on federation members either.
-
The recovery time objective (RTO) should be equal to “time to recognize the error + time to rename the database + time interval between the two ALTER statements.
How to help protect your database from widespread loss of data center facilities
To help protect against any data center loss in the event of a disaster, you need to create offsite storage of database backups outside of the data center, in which your database application is deployed. To achieve that, we recommend that you use both the database copy described in the previous section and the SQL Azure Import/Export Service. The database copy maintains the transactional consistency during export. The SQL Azure Import/Export Service copies the object definitions from your source SQL Azure database to a DAC export file (BACPAC), and then bulk copies the data from the user tables to BACPAC. You should determine the specific destination of BACPAC and the specific recovery procedure based on your application’s service level agreement and other business requirements.
For more information on how to use the SQL Azure Import/Export service, see SQL DAC Examples, and DAC Import/Export Hosted as a Service. For more information on how to use Windows Azure Management Portal to import/export data-tier application, see How to: Export a Data-tier Application (SQL Azure), and How to: Import a Data-tier Application (SQL Azure).
To help protect your data against the loss of a data center, you can implement the following approaches:
-
Export the BACPAC file to a blob using storage account in a different data center.
-
Export the BACPAC file to a blob using storage account in the same data center and rely on Windows Azure Storage geo-replication to copy the BACPAC file to the separate data center.
-
Import the BACPAC file to your on-premises SQL Server.
The following sections describe these approaches and also the advantages and disadvantages of each.
Export the BACPAC file to a blob using storage account in a different data center
With this approach, you have to create a storage account in a data center in the same region but separate from the data center that you have your SQL Azure account. For example, if your application and database run in the North Europe data center, we recommend that you maintain a copy of your database or BACPAC in another data center in the same region, such as West Europe. For more protection, you can also maintain a copy of your database or BACPAC in another data center in a different region, for example South Central US.
Once the DAC export operation is complete, you can immediately use the DAC import operation to recreate the database and data on another SQL Azure server in the same or different data center. Then, you can delete the used Windows Azure Blobs to reduce costs. Importing into SQL Azure at the targeted data center validates the operation and reduces the overall duration of recovery. In this case, you will get charged for the database. Alternatively, you can defer the import operation until the actual failover completes. In this case, Microsoft charges you for the Blob storage. Note that you would still need to perform periodic imports into the database in each backup data center to validate the operation and test the disaster recovery procedure. Note that you also will be charged for the bandwidth cost of copying the BACPAC to one or more data centers.
The following diagram illustrates an enhanced data protection. The diagram demonstrates performing a database copy in the data center A, and then exporting it to the Windows Azure Storage in the data center B. Next, the diagram shows importing the BACPAC to recreate the database in data center B and copying the BACPAC to the data center C. Finally, it shows importing the BACPAC to recreate the database in data center C.
If the data center A fails, the recovery point objective (RPO) or potential data loss in your application will be determined by how often you perform export. For example, if you export data once a day, your data loss will be 24 hours of data. The recovery time objective (RTO) of the database itself will be minimal if you perform an immediate database import. If you defer the import, the RTO will be determined by the size of the BACPAC and the time it takes to import into the new database.
Export the BACPAC file to a blob using storage account in the same data center
With this approach, you need to have a storage account in the same data center where you have your SQL Azure account. This recovery solution relies on the automatic geo-replication of Windows Azure storage objects, such as Windows Azure Blobs and Tables, to another data center in the same region, at no additional cost. For example, if your storage account is in the North Europe data center, your Windows Azure storage objects are automatically replicated to the second data center in the same region, such as West Europe.
With this approach, you do not have an option of immediately importing the BACPAC to the new database in the second data center. This is because the decision to failover is made by Windows Azure. Therefore, the geo-replicated BACPAC can be accessible only after Windows Azure executes the failover operation. You will be billed for the usage of the Windows Azure storage account but there is no bandwidth charge because the BACPAC copy is done within the same data center and the cost of geo-replication is included in the storage cost. It is important to note that the Import/Export service produces the BACPAC using the Block Blobs, which helps to preserve its integrity by replicating the entire BACPAC.
The following diagram illustrates copying a database in data center A, and then exporting it to the Windows Azure storage in the same data center. Next, the diagram shows that Windows Azure performs geo-replication of Windows Azure storage objects from the data center A to data center B automatically. Finally, the diagram illustrates importing the BACPAC to recreate the database in the data center B.
With this approach, the typical recovery point objective (RPO) should be equal to “archiving interval time x 2”. If the failure occurs right after the archive is created, the Blob data may be lost. In that case, only the previous archive will be available. The typical recovery time objective (RTO) should be equal to “24 hours + time to import”. Although the second option can take longer than then first option, it provides reduction in storage costs.
Warning |
|---|
| Windows Azure Blobs and Tables are geo-replicated between two data centers 100s of miles apart from each other on the same continent, to provide additional data durability in the case of a major disaster, at no additional cost. For more information, see Introducing Geo-replication for Windows Azure Storage blog post. With this first release of geo-replication, we do not provide an SLA for how long it might take to asynchronously geo-replicate the data, though transactions are typically geo-replicated within a few minutes after they have been committed in the primary location and the estimated time that the data will be accessible to customers after a disaster is 24 hours. |
Import the BACPAC file to an on-premise SQL Server
In addition, you can also download the BACPAC from your storage account to a local client machine and then import the archive into a local SQL Server 2012, SQL Server 2008 R2, SQL Server 2008, or SQL Server 2005 instance by using the Import/Export functionality provided by provided by SQL Server Management Studio available in SQL Server 2012. Importing the archive to a local instance of SQL Server will help you ensure that you maintain a local copy of your data without needing an internet connection or the availability of Windows Azure data centers. To import an archive file to a local instance of SQL Server, download the SQL Server 2012 product here and install the Management Tools. For more information, see Install SQL Server 2012 in the SQL Server Books Online. After the Microsoft SQL Server 2012 Management Tools is installed, download the archive export file from your storage account to your local client. Once the download is complete, connect to the target database by using SQL Server Management Studio. In the Object Explorer, right-click on the server and select Import Data-tier Application… and then follow the steps to import the archive export file.
Important Notes
-
These options require a Windows Azure Storage account.
-
Your application should manage failover events.
-
Your bill will include charges for each database copy.
-
Your bill will include charges for offline archives stored in your Windows Azure Storage account.
-
You are responsible for managing the database copies and dropping them when appropriate.
-
You might need an on-premise SQL Server instance.
-
After failover, you might lose some data.
-
The SQL Azure Database November 2011 release introduced Federations in SQL Azure (SQL Azure Database). To implement a recovery strategy for your federated data, you can either write your own custom export scripts using the bulk copy utility (BCP.exe) or System.Data.SqlClient.SqlBulkCopy class or use the SQL Azure Migration Wizard on Codeplex currently.
Warning The SQL Azure Migration Wizard tool is built by the community and not supported.
Data Sync Service and Disaster Recovery
Microsoft SQL Azure Data Sync service provides data synchronization capabilities for SQL Azure databases. The service currently has two main capabilities:
-
Synchronization of data between on-premises SQL Server databases and SQL Azure databases, allowing on-premises and cloud-based applications to utilize the same data.
-
Synchronization of data between two or more SQL Azure databases; the databases can be in the same data center, different data centers or different regions. This capability provides a scale-out mechanism for applications via multiple copies of the data. It also enables multiple application and database instances to be deployed around the world close to end-users with all the databases being kept in synchronization. Note that SQL Azure Data Sync is often used with Windows Azure Traffic Manager.
When using SQL Azure Data Sync, you can configure to synchronize data in a fixed interval. During the synchronization process, only the changed data is transmitted after the first initial synchronization; such as only the rows that have been inserted, updated, or deleted since the previous synchronization. SQL Azure Data Sync keeps multiple copies of your database up-to-date and these copies can be stored in the cloud or on-premises.
We recommend that you assess the following design considerations of the SQL Azure Data Sync service before using it as part of your disaster recovery strategy:
-
Data Sync does not synchronize transactions or preserve transaction boundaries. Data Sync applies net-changes in batches one table at a time, with each batch being a transaction. A small number of changes would be applied in one batch, but a large number of changes may be applied in multiple batches. If, for example, both an
Orderrow and the correspondingOrderDetailsrow are inserted in one transaction on one database, it is possible that theOrderrow and theOrderDetailsrow might get inserted in different transactions of the other database. In the event of a failure during synchronization, such as a network error, some transactions may not be applied until the next synchronization. If there is a failover before the transactions have been applied in the following synchronization, the data could be in an inconsistent state for the application; such as theOrderrow is present, but theOrderDetailsrow is not. -
The SQL Azure Data Sync service has some limitations regarding the database schemas supported. For example, SQL Azure Data Sync is unable to synchronize any table that does not have a Primary Key. In addition, SQL Azure Data Sync synchronizes only data but not stored procedures and triggers. For more information, see Data Sync FAQ.
-
SQL Azure Data Sync does not keep multiple versions of a database available to allow failover or restore to a particular time.
If these considerations are acceptable for your scenario and applications, SQL Azure Data Sync may be an option for your disaster recovery strategy.
If you use the Data Sync service as a disaster recovery solution, we recommend that you build application logic to validate the operational capacity of the application after failover.
For more information about SQL Azure Data Sync, see SQL Azure Data Sync documentation and Data Sync Best Practices topic in the MSDN library.
Warning |
|---|
| SQL Azure Data Sync is currently available only as a Preview and is meant only for product feedback for future releases and should not be used in production environments. |
See Also