Click to Rate and Give Feedback
MSDN
MSDN Library
Technical Articles
System Services
File Services
 A Programmer's Perspective on NTFS ...

  Switch on low bandwidth view
Files and I/0 Technical Articles
A Programmer's Perspective on NTFS 2000 Part 2: Encryption, Sparseness, and Reparse Points
 

Dino Esposito

May 2000

Summary: This, the second of two articles on Microsoft Windows 2000 NTFS, focuses on encryption, sparseness, and reparse points. These features, combined with multiple data streams and hard links, discussed in part one, combine to create a secure, manageable, and flexible file system. (16 printed pages)


Download Ntfs5pt2src.exe.

Contents

Introduction
The Properties Dialog Box
The Encrypted File System
Programmatic Encryption
The Cryptographic Service Provider
Sparse Files
Reparse Points
An Object Model for NTFS

Introduction

A serious file system for the enterprise can no longer prescind from a number of features that increase security, manageability, and flexibility, thus contributing to a more powerful overall system. To achieve this goal, the preferred Microsoft® Windows NT® file system (NTFS) has been improved in Windows® 2000 with the addition of several new features. In a previous article published in the March/April 2000 issue of MSDN News, I covered in detail multiple data streams and hard links, even though multiple data streams isn't a feature specific to NTFS 2000.

In this article, I'll guide you through other unique aspects of the NTFS 2000 file system, including the compression/encryption engine and the sparse streams and applications of reparse points such as mounting volumes.

The Properties Dialog Box

The documentation located in the MSDN Library summarizes all the technical information you will need on the various aspects of NTFS 2000. However, features like encryption, sparse streams, and mount volumes still have an intrinsic air of mystery surrounding them that a quick reference to the associated API functions does not adequately clarify. The goal of this article is to attempt to dissolve any mystery or indeterminateness surrounding encryption and other features of NTFS 2000. Let's begin with a look at the new Properties dialog box for all files hosted on an NTFS volume.

Figure 1. The Properties dialog box for an NTFS 2000 file

Figure 1 shows two new buttons that require some explanation. The Change button has really nothing to do with NTFS features and simply allows you to add your preferred application to the internal list of applications registered to work with that class of files. Basically, with the Change button you can define the items that the shell will prompt within the Open with submenu in the file's context menu.

Of much more interest from an NTFS standpoint is the Advanced button. As Figure 1 shows, the Advanced Attributes dialog box invoked from the Properties dialog box has a frame dedicated to encryption and compression settings. Although the user interface shows a couple of check boxes, in reality compression and encryption are two mutually exclusive settings for an NTFS volume. My guess is that Microsoft used check boxes instead of option buttons because any user could actually choose between three options: compression, encryption, or neither. Anyway, the check boxes for the other options are automatically cleared (if selected) when you click on one of the others. In Windows 2000, therefore, you have two options for improving data storage: encryption or compression.

Compression is not a feature new to Windows 2000 but has existed since version 3.51 of Windows NT. Compression is supported on NTFS volumes only, and on an individual file basis. To enable compression you must use in conjunction both CreateFile() and DeviceIoControl(). The former opens the file while the latter gets or sets the specific flag. GetFileAttributes() can be used to read the value of the compression flag with a single call. Once you obtain a handle to the file through CreateFile(), you can call DeviceIoControl(), just paying attention to the following:

unsigned short uCompType = COMPRESSION_FORMAT_DEFAULT;
DWORD dwReturnedBytes=0;
DeviceIoControl(hFile, FSCTL_SET_COMPRESSION, 
   (LPVOID) &uCompType, sizeof(unsigned short), 
   NULL, 0, &dwReturnedBytes, NULL);

The third parameter specifies the type of compression required. To uncompress a file, set it to COMPRESSION_FORMAT_NONE. The compression state is encoded as a 16-bit (unsigned short) value. The following code shows how to read about the compression through the file attributes:

DWORD dwAttrib = GetFileAttributes(szFileName);
if (dwAttrib & FILE_ATTRIBUTE_COMPRESSED) 
{…}

You can turn the compression attribute on also for a directory, meaning all the files in that directory will inherit the compression attribute and be automatically compressed on creation. For files, the compression/decompression is accomplished immediately and synchronously. For directories, it applies only to the newly created files and subfolders. To obtain the size of a compressed file, use the API GetCompressedFileSize() instead of the usual GetFileSize().

As mentioned earlier, in Windows 2000 compression and encryption are two mutually exclusive features. Encryption, though, is a Windows 2000-specific characteristic.

The Encrypted File System

With a little imagination, you can consider encryption as a particular type of compression. A compressed file cannot be encrypted as is, but needs to be uncompressed first. Like compression, the encryption takes place on a per-file basis and leverages the CryptoAPI library to camouflage the content of a file. However, don't assume an encrypted file takes up significantly more space on disk than the corresponding compressed file. Given the particular algorithm that CryptoAPI utilizes, an encrypted file also ends up being compressed to some extent.

Once you mark a certain file or folder as encrypted (see Figure 1), Windows 2000 utilizes the Encrypted File System (EFS) service to rule any access to that file or folder. In Figure 2 you will see the first manifestation of this.

Figure 2. Encrypting a file from the shell

The dialog box shown in Figure 2 appears when you check the Encrypt contents to secure data button on the Advanced Attributes dialog box (see Figure 1). You can decide to encrypt the single file or the entire folder. Notice that encrypting the parent folder only means that any new file you create or copy in the folder will automatically be encrypted. Directories, in fact, are not themselves encrypted. On the contrary, they're simply marked in such a way that EFS knows it has to use the current cryptographic driver and encrypt any new file that appears in the folder.

If you encrypt an empty or newly created directory, you can be sure that all its content will be protected against unwelcome users. However, if you encrypt a nonempty folder, you should take care to explicitly set the encryption attribute for all the existing files and subfolders.

Figure 3 shows the dialog box you are prompted with when you attempt to remove the encryption attribute from a folder.

Figure 3. Attempting to decrypt a folder

All the files and folder that have been encrypted are still visible to any user who can access the system. However, a user other than the one who set the protection mechanism cannot see the actual content of the files. Windows 2000, in fact, returns an Access Denied error message when an attempt is made to read or write to the content of a protected file. To hide any encrypted file or folder from view, you should use the standard Windows NT methods to control access and restrict user rights. In other words, encryption applies only to the content of a file or folder and is in no way involved with the standard security subsystem. You can think of EFS as an additional, but somewhat parallel, mechanism to increase security.

For the user who set it, the encryption layer is completely transparent. You work with encrypted files as usual and the system in the background will encrypt and decrypt them as needed. In particular, when you read a file EFS will make available the decrypted content, and when you need to save it the same EFS is responsible for encrypting the data. EFS works at the file-system level, detects the current user, and applies the encryption mechanism as appropriate. Given this, accessing a protected file through a Microsoft Win32® API function like CreateFile() or an MS-DOS® tool makes no difference at all.

Programmatic Encryption

Encryption is based on a couple of API functions: EncryptFile() and DecryptFile(). Such functions take advantage of the CryptoAPI library to do their job. With EncryptFile() the name of the file or folder takes only a single argument to encrypt. DecryptFile(), on the other hand, requires a second numeric argument (a DWORD), which is currently described as reserved and must be set to 0. You should use such functions to explicitly set or reset the encryption attribute on an individual basis—regardless of files or folders. Both EncryptFile() and DecryptFile() require exclusive access to the file being processed. In the event another process is already using the file, both functions will fail. If a file you're trying to encrypt is compressed, EncryptFile() automatically decompresses it before scrambling data. As mentioned earlier, any new file created or copied into a protected directory is automatically encrypted. This applies also to those files you copy from a different location and lack any protection flag.

When working in a nonencrypted folder, you can create an encrypted file in two ways: Either use CreateFile() specifying the FILE_ATTRIBUTE_ENCRYPTED attribute or create/copy the file as usual and then put EncryptFile() to work on it.

To detect whether a given file or folder is encrypted, you can still use GetFileAttributes(), as is the case with compressed files. The only difference is that now you should check attributes against a constant named FILE_ATTRIBUTES_ENCRYPTED. However, EFS gives you another chance: a new Windows 2000-specific function called FileEncryptionStatus():

BOOL FileEncryptionStatus(
  LPCTSTR lpFileName,
  LPDWORD lpStatus   
);

Its usage is quite straightforward. Just indicate the file name and pass in a DWORD buffer that the function will fill with a value. Such a value denotes whether the file is already encrypted or encryptable and, if not, why it isn't. FileEncryptionStatus(), as you can see, gives more information than GetFileAttributes(), which limits the return of the current attribute for encryption.

All the files can be encrypted, with the exception of system files and system and root directories. However, the NTFS encryption capability can be turned on and off through a function called EncryptionDisable(). Through this means, you can programmatically stop the automatic encryption/decryption mechanism for folders marked as encrypted:

BOOL EncryptionDisable(
    LPCWSTR DirPath,   
    BOOL Disable      
);

Of course, EncryptionDisable() applies only to folders. Once you've called the function with the Disable parameter set to TRUE, all the files included in the specified directory, EncryptFile() for clear files and DecryptFile() for encrypted files, return with an error. Instead of calling EncryptionDisable(), you can obtain the same effect by writing a couple of lines in the desktop.ini file of the folder:

[Encryption]
Disable=0

This will disable the encryption. Any future attempt to apply encryption in the folder, therefore, is destined to fail, as shown in Figure 4.

Figure. 4. An unsuccessful attempt to encrypt on a disabled directory

Of course, if the directory already contains a desktop.ini file, adding the two lines to it will suffice. By assigning the Disable entry the value of 0 you reenable the encryption machine.

The Cryptographic Service Provider

The Windows 2000 EFS is built on top of the CryptoAPI library and, as such, is highly dependent on a module called Cryptographic Service Provider (CSP). CSP is a sort of server that actually does the dirty job of encoding and decoding data. The Windows 2000 CSP is implemented in rsabase.dll. CSP works in conjunction with a database of key containers that stores all the private and public keys for users accessing that computer. (EFS provides file encryption through a public-key system.) There are functions to access and modify the content of such a database. You can, for example, add a new user, or remove an existing one, and change the key for a given user as well. To find out all users who can access a given encrypted file, use the QueryUsersOnEncryptedFile() function. Through a relatively complex web of exoteric data structures, it guides you to the display information on each user. Look at the following code snippet:

PENCRYPTION_CERTIFICATE_HASH_LIST p;
DWORD dw = QueryUsersOnEncryptedFile(wszFile, &p);
if (p->nCert_Hash >0)
   MessageBoxW(0, p->pUsers[0]->lpDisplayInformation,
      L"Users", MB_OK);

The ENCRYPTION_CERTIFICATE_HASH_LIST structure returns a list with all the users admitted to that file. The field nCert_Hash returns the number of users—namely, the length of the list. The member pUsers points to an array of structures called ENCRYPTION_CERTIFICATE_HASH. Such a structure has a member called lpDisplayInformation, which displays the string associated with the subject of the certificate. Each user is identified by a certificate. From Control Panel, under Local Security Policy, you can view all the details about the certificates and run a wizard to export them to a file. From this same location you can run the Recovery Agent Wizard—a tool that enables users with Administrator privileges to generate recovery keys capable of decrypting files. This is a security backdoor that should ensure you never lose your data.

EFS provides the capability of encrypting and decrypting files transparently to the user and at the file-system level. This same pattern could be applied at the application level and, say, buried in a handful of Microsoft Foundation Classes (MFC) document/view classes. This is demonstrated in an article of mine, "Supporting CryptoApi in Real-World Applications," that originally appeared in the June 1997 issue of MIND (now part of MSDN Magazine).

Sparse Files

At least two categories of applications—those that manage handmade databases and those that do image processing—often run into a nasty problem: They end up working with files whose disk content is not entirely significant, and more often than not is in large part set/settable to zero. Files that contain a lot of zeros are usually called sparse files. A typical scenario that is likely to produce sparse files is when you manage lists of records on disk or when you have to work with variable-length and highly volatile information. The unpleasant side effect of this is that you work with files that take up much more space than actually needed, but compacting them regularly might be expensive and time-consuming.

The compression feature supported beginning with Windows NT 3.51 only partially addresses this problem. A compression algorithm could easily reduce the zeros to a very few bytes, but the price to pay for this is the extra processing time needed for compressing and decompressing.

The Windows 2000 NTFS recognizes sparse files as a file-system feature and provides built-in facilities to cope with them. Basically, NTFS 2000 accepts files whose content is not written as a whole from a single starting to a single ending point. Once a file has been marked as a sparse file, NTFS allows you to invalidate a certain block of data that physically belongs to the file. Such a space is then made available to other applications and becomes part of the unused disk space. The system, however, keeps internal track of such "holes" in the file's body on disk.

To enable this ability (sparseness) on a file or, better yet, on a stream, you should use again the DeviceIoControl() function, as shown here:

DWORD dwReturnedBytes=0;
DeviceIoControl(hFile, 
   FSCTL_SET_SPARSE, NULL, 0, 
   NULL, 0, &dwReturnedBytes, 
   NULL);

For this code snippet to work, make sure the _WIN32_WINNT macro is correctly set to a hex value of at least 5:

#define _WIN32_WINNT   0x0500

In fact, the winioctl.h header file that contains the definitions for all the flags that apply to DeviceIoControl() includes FSCTL_SET_SPARSE only if the target platform is at least Windows 2000; the version number of Windows 2000 is 5.0.

To check whether a file or a stream is sparse you can use GetFileAttributes() and check against the FILE_ATTRIBUTES_SPARSE_FILE:

DWORD dwAttrib = GetFileAttributes(m_szFile);
return (dwAttrib & FILE_ATTRIBUTE_SPARSE_FILE);

Don't forget, however, that you can obtain the same information via the function GetFileInformationByHandle(). Such a routine works the same way as GetFileAttributes() but requires a file handle instead of a file name. You can use GetFileInformationByHandle() to find out about compression and encryption as well.

By far, though, the most intriguing aspect of sparse files is the possibility of declaring unused blocks of data within the body of the file. To do so, you call DeviceIoControl() and specify the flag of FSCTL_SET_ZERO_DATA. In this case, the function also needs to know the range of bytes to be cleared:

FILE_ZERO_DATA_INFORMATION fzdi;
fzdi.FileOffset.QuadPart = lFrom;
fzdi.BeyondFinalZero.QuadPart = lTo;
DeviceIoControl(hFile, FSCTL_SET_ZERO_DATA, 
   (LPVOID) &fzdi, sizeof(fzdi), NULL, 0, 
   &dwReturnedBytes, NULL);

The data structure that holds the information about the range of bytes to zero looks like this:

typedef struct _FILE_ZERO_DATA_INFORMATION {
  LARGE_INTEGER FileOffset;
  LARGE_INTEGER BeyondFinalZero;
} FILE_ZERO_DATA_INFORMATION;

Using DeviceIoControl() with the DeviceIoControl() flag is not the same as writing zeros to the file through WriteFile() or a similar API function. In the latter case, in fact, Windows writes zeroed bytes to disk. It frees the involved clusters if you use DeviceIoControl(). This results in a substantial difference between the logical and the physical size of the sparsed file. Figure 5 depicts an example of this.

Figure 5. Logical and physical size of a sparsed file

The blocks in green represent usable data. The blocks in white are areas of the sparse file that have been freed through DeviceIoControl(). However, if you ask GetFileSize() to return the size of the file, what you obtain evaluates to the actual and overall size of the clusters the file spans over. To obtain the amount of space it really takes up, use GetCompressedFileSize() instead. Taking this concept to the extreme, you can have a file that officially occupies several gigabytes of disk space although its real content doesn't cover more than a few bytes. To demonstrate this, Jeff Richter and Felipe Cabrera, in their article "A File System for the 21st Century," Microsoft Systems Journal, November 1998, suggested testing the effect of a piece of code like this:

HANDLE h = CreateFile("A huge file.txt", 
   GENERIC_WRITE, 0, NULL, 
   CREATE_ALWAYS, 0, NULL);
DWORD dw;
DeviceIoControl(h, FSCTL_SET_SPARSE, NULL, 
   0, NULL, 0, &dw, NULL);
LONG lDist = 8;   // 32 GB!!!
SetFilePointer(h, 0, &lDist, FILE_BEGIN);
SetEndOfFile(h);
CloseHandle(h);

In Figure 6 you can observe the software miracle of a 32-GB file on a 2-GB disk!

Figure 6. Believe it or not, but a 2-GB hard disk can store a 32-GB file.

The trick—and, incidentally, the role of sparse streams—is revealed through the Properties dialog box as seen in Figure 7.

Figure 7. The "size on disk" item shows a much more reasonable 8 KB.

In a nutshell, NTFS 2000 sparse files allow you to work with files of any size, as long as you free the unneeded portions. The range of bytes you can zero and free through DeviceIoControl() still remain part of the file's body, but for them the system doesn't allocate any disk space. If you want you can read all and only the nonzeroed parts of the file, yet you have to skip them through code. This process speeds up considerably if you use DeviceIoControl() with another flag: FSCTL_QUERY_ALLOCATED_RANGES. The DeviceIoControl()'s input buffer argument points to a FILE_ALLOCATED_RANGE_BUFFER structure that specifies the range of bytes to check. Through the output buffer the system returns an array of the same structures, each of which contains initial offset and length of any allocated range of bytes within the file. There's no way to know in advance how many items this array could contain and, henceforth, how much memory you must allocate:

FILE_ALLOCATED_RANGE_BUFFER farb;
farb.FileOffset.QuadPart = 0;
farb.Length.QuadPart = nFileSize;
dwMaxSize = MAX_BUFFER;
prgfarb = malloc(dwMaxSize);
DeviceIoControl(hFile,
   FSCTL_QUERY_ALLOCATED_RANGES,   
   &farb, sizeof(farb), prgfarb,
   dwMaxSize, &dwBytesReturned, NULL);

You can use the returned array to walk through all the allocated ranges of bytes and manually compact the file when needed.

As mentioned earlier, DeviceIoControl() with FSCTL_SET_SPARSE converts a normal file or stream into a sparse stream. This is an irreversible operation. Once a stream has been converted to a sparse stream you can only keep or delete it. Setting the sparse attribute doesn't automatically zero all the blank areas the file may already have. To accomplish this, you should scan the file, locate blank areas, and call DeviceIoControl() with FSCTL_SET_ZERO_DATA on each of them. Finally, bear in mind that if you set to zero two consecutive blank blocks through separate calls to DeviceIoControl(), Windows 2000 will leave them as is and will not even attempt to fuse them together.

Reparse Points

Technically speaking, a reparse point is a block of user-defined data associated with a file or a directory. There are no special rules about its internal format. What matters is that the installing application and the specified file-system filter can recognize and understand it. When the NTFS file system is about to open a file or a folder with a reparse point, it reads the unique reparse point tag name and passes the raw data (up to 16 KB) it contains to the file system filter registered to process that tag. What the file system filter does at this point is related to the particular goals it was installed for. The following URL:

http://www.microsoft.com/hwdev/ntifskit/

is the home page for the Windows 2000 Installable File System (IFS) Kit. IFS is a developer's kit for writing file-system filter drivers for both Windows NT 4.0 and Windows 2000.

A file system filter is a driver layer placed on top of a file-system driver. It allows you to spy on the file-system activity, intercepting I/O requests and responses. The filter can do anything—leave data unchanged, modify, redirect, or perform extra tasks. A reparse point is a block of data you want such a filter to retrieve and process when someone is accessing a given file or folder. As mentioned a moment ago, a reparse point is identified by an ID whose uniqueness is ensured by Microsoft provided that you request it through the IFS official channel.

You can get, set, or delete a reparse point using the now-familiar interface of DeviceIoControl(). The REPARSE_GUID_DATA_BUFFER structure is used to set or get the reparse point custom data. If you want to access a file by-passing any possible filter, use CreateFile() with the FILE_FLAG_OPEN_REPARSE_POINT flag set. In this way, you are allowed to access the raw content of the file, regardless of any possible change or redirection the filter is expected to do. In most cases, setting reparse points is significant only if you have a custom file-system filter installed.

In Windows 2000, several services have been implemented through reparse points. Among others, there is the Removable Storage Management (RSM), Native Structured Storage (NSS), and the previously mentioned Encrypting File System (EFS). RSM, in particular, is a service that, according to fixed criteria, automatically migrates data from local volumes to removable media such as tapes. In doing so, RSM deletes the body of the file but leaves the file name entry on. In addition, it adds a reparse point on the fly with all the information needed to retrieve the content later on when someone attempts to read or modify the file. NSS, instead, is the mechanism that enables document properties (Author, Title, Company, etc.) to be associated with all files resident on an NTFS volume. The filter will save and read these properties to and from separate streams within the same file.

Mounting volumes is another NTFS 2000 feature that saw the light thanks to reparse points. A mount volume is a directory on an NTFS partition that is used as a gateway to access another volume. In other words, a mount volume point is a directory that hosts the specified volume. The mount volume NTFS service ensures that the external mounted volume and the NTFS "copy" are always kept synchronized. This sort of mirroring effect can be obtained via a couple of simple calls to the NTFS API. Of course, the mounting volume must be an NTFS directory, but it can be placed at any level in the volume directory tree. The mounted volume—that is, the volume you want to mirror through an NTFS path—must be a root directory. The mounted volume can have any file system, not necessarily NTFS 2000. The following code demonstrates the two-step procedure necessary to mount a volume. For one thing, you should determine a unique name for the volume. A volume is usually identified by a label and drive letter. However, this information might sometimes change across system restarts—for example, if you add or remove drives. Before you mount a volume, therefore, you should obtain a unique volume name. This is the task that a function like GetVolumeNameForVolumeMountPoint() is called to perform:

TCHAR szUniVolName[1024];
GetVolumeNameForVolumeMountPoint(
   szVolToMount, szUniVolName, 1024);

The function returns a string in a very particular format:

\\?\Volume{GUID}\

GUID is a globally unique identifier that from now on identifies the new volume. Such a string is the perfect input for SetVolumeMountPoint(), which finalizes the procedure:

SetVolumeMountPoint(szMountVol, szUniVolName);

Once set, a mount volume survives system restarts and becomes persistent unless you delete it through DeleteVolumeMountPoint(). This function requires only a single argument as the name of the mount volume—that is, the NTFS folder path where an external volume has been mounted. Figure 8 shows the effect of a volume mounting. I created a new DiskE folder on my disk drive C that has been previously formatted as NTFS.

Figure 8. Mounting disk E:\ onto C:\DiskE\

Two straightforward lines of script code:

set ntfs = CreateObject("NTFSOM.FileObject")
ntfs.MountVolume "C:\DiskE\", "e:\"

finalized the job the figure illustrates. Now disk E (formatted as FAT) is linked to the DiskE folder on an NTFS volume.

Notice that the mount point cannot contain any subdirectory or files for the operation to be successful. If you attempt to mount on an already mounted point, the existing mount is silently dismissed. All the folder names you use with the mount volume API functions must terminate with a backslash.

An Object Model for NTFS

In the last code snippet, I mentioned an NTFSOM COM object you can use to accomplish some NTFS-related tasks. Such an object model doesn't come with any product, but is simply the companion code for this article. The typical file system object model for scriptwriters is Scripting.FileSystemObject (FSO), which comes as a part of any Microsoft Visual Basic® Scripting Edition (VBScript) and Microsoft JScript® installation. FSO, however, doesn't provide specific support for many NTFS-specific features. You can work with multiple data streams and read the compression attribute, but no more than this.

The NTFSOM object is far from being complete. Nevertheless, it exposes through scripting most of the features I've covered so far. Table 1 summarizes methods and properties.

Table 1.

Compressed Read/write property that toggles the compression status for the currently opened file or folder. A nonzero value enables, 0 disables.
CompressedSize Read-only property that returns the physical size of the currently opened file or folder. Applies both to compressed and sparse files.
Encrypted Read/write Boolean property that toggles the encryption status for the currently opened file or folder. A nonzero value enables, 0 disables.
EncryptionStatus Read-only property that returns the encryption status for the current file or folder.
Size Read-only property that returns the logical size of the currently opened file or folder.
Sparse Read-only Boolean property that checks the sparseness flag for the currently opened file or folder.
DeleteMountVolume Method that deletes the specified mount volume. Its only argument must be the name of the NTFS mounting volume.
MakeSparse Method that makes sparse the currently opened file. Takes no argument.
MountVolume Method that mounts a volume. Takes two arguments being the NTFS mounting path and the volume you want to mount.
SetFile Method that takes a single argument being the name of the file you want to open. A call to this method is mandatory before using any of the properties.
Zero Method that sets to zero the specified range of bytes in a sparse file.

Setting both Compressed and Encrypted to 1 (or any other nonzero value) automatically triggers the compression and encryption engine to work on the file.

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Page view tracker