About Data Deduplication Backup/Restore API

Article
12/12/2011

[This documentation is preliminary and is subject to change.]

The Data Deduplication Backup/Restore API programming model enables backup applications to support backup and restore of volumes enabled for Data Deduplication. Applications can use the interfaces described in the Data Deduplication Backup/Restore API Reference to optimize backup and restore of file data.

Backup

There are two different ways to back up files from a volume that is enabled for Data Deduplication: nonoptimized backup and optimized backup.

Nonoptimized Backup

In nonoptimized backup, the Data Deduplication-optimized files are copied as normal files to the backup store. The files are opened without the FILE_FLAG_OPEN_REPARSE_POINT flag. In this case, the optimized files are transparently "rehydrated" in memory by Data Deduplication during the copy operation and stored in the backup store as normal files. Restore from such a backup store is a normal file copy operation; applications do not need to use the Data Deduplication Backup/Restore API for nonoptimized backup.

Optimized Backup

Optimized backup is performed by copying Data Deduplication-optimized files as reparse points (by opening the files with the FILE_FLAG_OPEN_REPARSE_POINT flag) and copying the full set of Data Deduplication store container files. In many cases, such as a full volume backup (either block-level or file-level), performing an optimized backup will result in a smaller, faster backup. In these cases the backup will be smaller, because the total size of the optimized files (reparse points), non-optimized files (not in-policy) and Data Deduplication store container files is significantly smaller than the full logical size of the volume. Optimized backups are faster, because there is less I/O (smaller backup) and because the optimized files are not "rehydrated" during the file copy operations. Selective backup where most of the volume is being backed up may also benefit from using the optimized backup approach, depending on how the logical size of the selected files compares to the physical size of selected files and the size of the Data Deduplication container files.

Restore from Nonoptimized Backup

Restore from a nonoptimized backup store is no different from restore from any normal backup source. The main caveat is that the size of the data in the backup store is normally much larger than the original volume due to the space savings from Data Deduplication. A full volume restore from such a backup store will usually not fit on the original or equivalently sized volume.

Restore from Optimized Backup

Restore from an optimized backup can be done by using full-volume restore techniques or by using one of the file-level selective restore approaches described in the following sections.

Full Volume Restore from Optimized Backup

File-level full volume restore can and should be performed in an optimized manner by essentially reversing the procedure described in the Optimized Backup section. The complete set of Data Deduplication metadata and container files are restored, then the complete set of Data Deduplication reparse points are restored, followed by restore of all non-deduplicated files. Restoring volumes in this manner has the advantage of faster recovery time as compared to full-volume restore from a nonoptimized backup store because the amount of copied data is much smaller. Also, the size of the restored data in an optimized full volume restore is known to fit on a volume of the original size. Restoring to the original or equivalently-sized volume from a nonoptimized backup will likely reach a disk-full condition because the files will be replaced at their full logical (non-deduplicated) size.

Block-level restore from an Optimized Backup is automatically an optimized restore because the restore occurs underneath Data Deduplication, which works at the file level.

It is recommended to always perform full volume restore to a freshly formatted volume of equivalent or greater size than the original volume. The target volume can be the original reformatted volume or a freshly created volume.

Note that full volume restore procedures do not need to use the Data Deduplication Backup/Restore API.

Selective Restore from Optimized Backup

Selective restore from an Optimized Backup can be done by using one of the following approaches.

Partial volume restore to a newly formatted volume
Restore from a data deduplication backup store
Selective file restore using the Data Deduplication Backup/Restore API

Partial Volume Restore to Newly Formatted Volume

This approach is very similar to the file-level full volume restore described above where only a subset of the Data Deduplication reparse points are restored back to a freshly formatted volume. Note that this approach requires that the entire set of Data Deduplication metadata and container files be copied to the new volume. This approach is appropriate if most of the volume is being restored. The application can make a calculated decision about whether this is an appropriate technique by comparing the size of the Data Deduplication store files plus physical size of all the restored files to the total logical size of the target restore file set. This approach is only appropriate when there are portions of the volume that are not required on the restored volume.

Restore from a Data Deduplication Backup Store

If the backup store is located on a volume enabled with Data Deduplication, both the backup store and the original volume are optimized. In this case, the backup application can simply copy the files as normal files back to the target volume. The files will be rehydrated in memory from the backup store, so the application will be subjected to increased I/O latency while reading the backup store. If the file on the target volume is missing, or has been fully recalled from the Data Deduplication store, there will be no increased latency on the target volume. However, to be sure, it is good practice to first delete the file, if it exists, from the target volume before restoring the file with this approach.

Selective File Restore Using Data Deduplication Backup/Restore API

Applications that need to perform single or multi-file restore from an optimized backup may use the Data Deduplication Backup/Restore API. If one of the above approaches cannot be used—for example, if the target volume cannot be reformatted—and the backup store is not an NTFS volume enabled for Data Deduplication, this approach may be the only alternative.

Because the Data Deduplication on-disk schema is not publicly documented, Data Deduplication drives the restore process by relying on the application to perform required read I/O requests on its behalf. In response to each read request serviced by the application against Data Deduplication files in the backup store, Data Deduplication will interpret its own metadata and data, which may result in further read requests in some cases.

The backup application provides read access to Data Deduplication files in the backup store by using a COM callback mechanism. The overall process of restoring a file from an optimized backup store proceeds as follows.

The application must ensure that the following prerequisites are in place:
- Data Deduplication must be installed on the machine holding the target volume.
- The target volume must be formatted with NTFS, but does not need to be enabled for Data Deduplication.
The application copies the reparse point for the target file from the backup store.

Note Because the restore process described here has multiple steps, the application may choose to restore the target file to a temporary location in order to prevent end-user access attempts while the file is being restored.
The application initiates a file restore operation using IDedupBackupSupport::RestoreFile. The application specifies the full path to the reparse point file specified in the previous step. The application also provides Data Deduplication with an instance of IDedupReadFileCallback that is ready to accept incoming calls from the Data Deduplication restore engine.
The application receives one or more of the following calls from Data Deduplication on each of the following methods of the callback interface. The callbacks are serviced by the same application thread that called IDedupBackupSupport::RestoreFile.
- IDedupReadFileCallback::ReadBackupFile – this method issues read requests targeting a Data Deduplication metadata or container file located in the application's backup store. The application ultimately issues the actual I/O to satisfy the read request by whatever means necessary. The application does not need to understand the file format or parse it in any way. The application only needs to be able to read the specified ranges of the file from the backup media.
- IDedupReadFileCallback::OrderContainersRestore – this method provides the application with the ability to influence the order of the pending reads to multiple container files that are required to retrieve the target file data. Implementation of this method by the application is optional. Providing an implementation may be important in order to achieve acceptable restore performance from sequential access backup media such as tape or to otherwise improve restore performance. If the application does not provide an ordered set of container extents, Data Deduplication will generate one extent per container file (full file extents) in an arbitrary order.
- IDedupReadFileCallback::PreviewContainerRead – this method provides the application with a preview of the sequence of reads that are pending for a given container file extent reported by OrderContainersRestore (or generated by Data Deduplication, if the application does not implement OrderContainersRestore). The application may use this per-container extent read plan to increase the efficiency of the pending reads through read-ahead, caching or other mechanisms.
Using the callback methods described above, Data Deduplication reads metadata and target file data in an iterative manner, according to a read-plan optionally influenced by the application, and writes the file data back to the target file data stream. In this manner, Data Deduplication reconstructs the target file piece-by-piece until the full file is restored at the target location in its original, non-deduplicated form.
Data Deduplication removes the reparse point and metadata that were copied by the application in step 1.

Note If the file was restored to a temporary location, the application should rename the file to its target destination.

Build date: 12/12/2011