Click to Rate and Give Feedback
Related Articles
Here the author introduces SQL Server Data Services, which exposes its functionality over standard Web service interfaces.

By David Robinson (July 2008)
Here the author answers questions regarding the Entity Framework and provides an understanding of how and why it was developed.

By Elisa Flasko (July 2008)
Here we present techniques for programmatic and declarative data binding and display with Windows Presentation Foundation.

By Josh Smith (July 2008)
Systems that handle failure without losing data are elusive. Learn how to achieve systems that are both scalable and robust.

By Udi Dahan (July 2008)
More ...
Articles by this Author
In this month’s installment of .NET Matters, columnist Stephen Toub answers reader questions concerning asynchronous I/O .

By Stephen Toub (July 2008)
This month Stephen Toub discusses asynchronous stream processing.

By Stephen Toub (March 2008)
This month Stephen Toub explains how to make the most of dual processors when running encryption and compression tasks.

By Stephen Toub (February 2008)
The author creates a managed wrapper to use the new IFileOperations interface in Windows Vista from managed code.

By Stephen Toub (December 2007)
Find out how to use finalizers as a way to warn developers who use your custom types when they are garbage collected without having been disposed of correctly.

By Stephen Toub (November 2007)
This month Stephen Toub discusses deadlocks that can occur when synchronizing threads.

By Stephen Toub (October 2007)
Stephen Toub and Shawn Farkas discuss creating an adapter that takes the functionality of RNGCryptoServiceProvider and adapts it to the interface of Random.

By Stephen Toub and Shawn Farkas (September 2007)
Stephen Toub gets nostalgic as he prepares to leave MSDN Magazine.

By Stephen Toub (August 2007)
More ...
Popular Articles
Here the author introduces SQL Server Data Services, which exposes its functionality over standard Web service interfaces.

By David Robinson (July 2008)
The .NET Compact Framework 3.5 provides a subset of Windows Communication Foundation (WCF) functionality that you can harness to communicate between Windows Mobile devices and desktop PCs. We'll show you how.

By Andrew Arnott (Launch 2008)
Here the author uses Document Information Panels in the Microsoft 2007 Office system to manipulate metadata from Office docs for better discovery and management.

By Ashish Ghoda (April 2008)
Learn how to create a workflow that uses InfoPath forms and other office documents for passing data to targeted activities and for use in Office documents.

By Rick Spiewak (June 2008)
More ...
Read the Blog
There are many things called threat modeling. Rather than argue about which is "the one true way," a good practice is to consider your needs and what your skills, abilities, and schedules are, and then work with a method that's best for you. In the July 2008 issue of MSDN Magazine, ...
Read more!
Want to develop games for Xbox Live? Want to get paid for it, too? Click on over to the XNA Team Blog to learn more about their initial rollout of the XNA Creators Club for XNA Game Studio. ...
Read more!
The Microsoft Entity Data Model (EDM), based on Dr. Peter Chen's Entity Relationship (ER) model, is the driving force behind the ADO.NET Entity Framework. The EDM is also the feature that most significantly differentiates the Entity Framework from other ORM-style technologies in the marketplace. In the July 2008 issue of MSDN ...
Read more!
System.IO.File is a handy helper class for reading and writing data, but its methods support only synchronous operation. Is there an easy way to provide File’s functionality for asynchronous file I/O? In the July 2008 issue of MSDN Magazine, Stephen Toub walks through several ...
Read more!
Remember .NET Terrarium, the interactive game meant to introduce .NET development techniques? Well, the Windows SDK team has released the source code for .NET Terrarium 2.0 on CodePlex. You can read more about this release on the Windows SDK blog and at Microsoft ...
Read more!
The Enumerable class plays an important role in every LINQ query you create. Because the Enumerable class's extension methods can process many other classes—including Array and List—you can use methods of the Enumerable class not only to create LINQ queries, but also to manipulate the behavior of arrays and other data structures. In the July 2008 issue of MSDN ...
Read more!
More ...
.NET Matters
Iterating NTFS Streams
Stephen Toub

Code download available at: NETMatters0601.exe (131 KB)
Browse the Code Online

Q I read in your December 2005 column how to enumerate files using interop to access the file management functions in Kernel32.dll (see .NET Matters: BigInteger, GetFiles, and More). I don't need to enumerate files, but rather I need to enumerate the NTFS Alternate Data Streams within a particular file. Do you know any way I can accomplish this?
Q I read in your December 2005 column how to enumerate files using interop to access the file management functions in Kernel32.dll (see .NET Matters: BigInteger, GetFiles, and More). I don't need to enumerate files, but rather I need to enumerate the NTFS Alternate Data Streams within a particular file. Do you know any way I can accomplish this?

A First, nothing in the Microsoft® .NET Framework provides this functionality. If you want it, plain and simple you'll need to do some sort of interop, either directly or using a third-party library.
A First, nothing in the Microsoft® .NET Framework provides this functionality. If you want it, plain and simple you'll need to do some sort of interop, either directly or using a third-party library.
If you're using Windows Server 2003 or later, Kernel32.dll exposes counterparts to FindFirstFile and FindNextFile that provide the exact functionality you're looking for. FindFirstStreamW and FindNextStreamW allow you to find and enumerate all of the Alternate Data Streams within a particular file, retrieving information about each, including its name and its length. The code for using these functions from managed code is very similar to that which I showed in my December column, and is shown in Figure 1.
You simply call FindFirstStreamW, passing to it the full path to the target file. The second parameter to FindFirstStreamW dictates the level of detail you want in the returned data; currently, there is only one level (FindStreamInfoStandard), which has a numerical value of 0. The third parameter to the function is a pointer to a WIN32_FIND_STREAM_DATA structure (technically, what the third parameter points to is dictated by the value of the second parameter detailing the information level, but as there's currently only one level, for all intents and purposes this is a WIN32_FIND_STREAM_DATA). I've declared that structure's managed counterpart as a class, and in the interop signature I've marked it to be marshaled as a pointer to a struct. The last parameter is reserved for future use and should be 0.
If a valid handle is returned from FindFirstStreamW, the WIN32_FIND_STREAM_DATA instance contains information about the stream found, and its cStreamName value can be yielded back to the caller as the first stream name available. FindNextStreamW accepts the handle returned from FindFirstStreamW and fills the supplied WIN32_FIND_STREAM_DATA with information about the next stream available, if it exists. FindNextStreamW returns true if another stream is available, or false if not.
As a result, I continually call FindNextStreamW and yield the resulting stream name until FindNextStreamW returns false. When that happens, I double check the last error value to make sure that the iteration stopped because FindNextStreamW ran out of streams, and not for some unexpected reason.
Unfortunately, if you're using Windows® XP or Windows 2000 Server, these functions aren't available to you, but there are a couple of alternatives. The first solution involves an undocumented function currently exported from Kernel32.dll, NTQueryInformationFile. However, undocumented functions are undocumented for a reason, and they can be changed or even removed at any time in the future. It's best not to use them. If you do want to use this function, search the Web and you'll find plenty of references and sample source code. But do so at your own risk.
Another solution, and one which I've demonstrated in Figure 2, relies on two functions exported from Kernel32.dll, and these are documented. As their names imply, BackupRead and BackupSeek are part of the Win32® API for backup support:
BOOL BackupRead(HANDLE hFile, LPBYTE lpBuffer,
                DWORD nNumberOfBytesToRead,
                LPDWORD lpNumberOfBytesRead, BOOL bAbort,
                BOOL bProcessSecurity, LPVOID* lpContext);

BOOL BackupSeek(HANDLE hFile, DWORD dwLowBytesToSeek, 
                DWORD dwHighBytesToSeek, LPDWORD lpdwLowByteSeeked, 
                LPDWORD lpdwHighByteSeeked, LPVOID* lpContext);
                
The idea behind BackupRead is that it can be used to read data from a file into a buffer, which can then be written to the backup storage medium. However, BackupRead is also very handy for finding out information about each of the Alternate Data Streams that make up the target file. It processes all of the data in the file as a series of discrete byte streams (each Alternate Data Stream is one of these byte streams), and each of the streams is preceded by a WIN32_STREAM_ID structure. Thus, in order to enumerate all of the streams, you simply need to read through all of these WIN32_STREAM_ID structures from the beginning of each stream (this is where BackupSeek becomes very handy, as it can be used to jump from stream to stream without having to read through all of the data in the file).
To begin, you first need to create a managed counterpart for the unmanaged WIN32_STREAM_ID structure:
typedef struct _WIN32_STREAM_ID 
{
    DWORD dwStreamId;  
    DWORD dwStreamAttributes;  
    LARGE_INTEGER Size;  
    DWORD dwStreamNameSize;  
    WCHAR cStreamName[ANYSIZE_ARRAY];
} 
WIN32_STREAM_ID;
For the most part, this is like any other structure you'd marshal through P/Invoke. However, there are a few complications. First and foremost, WIN32_STREAM_ID is a variable-sized structure. Its last member, cStreamName, is an array with length ANYSIZE_ARRAY. While ANYSIZE_ARRAY is defined to be 1, cStreamName is just the address of the rest of the data in the structure after the previous four fields, which means that if the structure is allocated to be larger than sizeof (WIN32_STREAM_ID) bytes, that extra space will in effect be part of the cStreamName array. The previous field, dwStreamNameSize, specifies exactly how long the array is.
While this is great for Win32 development, it wreaks havoc on a marshaler that needs to copy this data from unmanaged memory to managed memory as part of the interop call to BackupRead. How does the marshaler know how big the WIN32_STREAM_ID structure actually is, given that it's variable sized? It doesn't.
The second problem has to do with packing and alignment. Ignoring cStreamName for a moment, consider the following possibility for your managed WIN32_STREAM_ID counterpart:
[StructLayout(LayoutKind.Sequential)]
public struct Win32StreamID
{
  public int dwStreamId;
  public int dwStreamAttributes;
  public long Size;
  public int dwStreamNameSize;
}
An Int32 is 4 bytes in size and an Int64 is 8 bytes. Thus, you would expect this struct to be 20 bytes. However, if you run the following code, you'll find that both values are 24, not 20:
int size1 = Marshal.SizeOf(typeof(Win32StreamID));
int size2 = sizeof(Win32StreamID); // in an unsafe context
The issue is that the compiler wants to make sure that the values within these structures are always aligned on the proper boundary. Four-byte values should be at addresses divisible by 4, 8-byte values should be at boundaries divisible by 8, and so on. Now imagine what would happen if you were to create an array of Win32StreamID structures. All of the fields in the first instance of the array would be properly aligned. For example, since the Size field follows two 32-bit integers, it would be 8 bytes from the start of the array, perfect for an 8-byte value. However, if the structure were 20-bytes in size, the second instance in the array would not have all of its members properly aligned. The integer values would all be fine, but the long value would be 28 bytes from the start of the array, a value not evenly divisible by 8. To fix this, the compiler pads the structure to a size of 24, such that all of the fields will always be properly aligned (assuming the array itself is).
If the compiler's doing the right thing, you might be wondering why I'm concerned about this. You'll see why if you look at the code in Figure 2. In order to get around the first marshaling issue I described, I do in fact leave the cStreamName out of the Win32StreamID structure. I use BackupRead to read in enough bytes to fill my Win32StreamID structure, and then I examine the structure's dwStreamNameSize field. Now that I know how long the name is, I can use BackupRead again to read in the string's value from the file. That's all well and dandy, but if Marshal.SizeOf returns 24 for my Win32StreamID structure instead of 20, I'll be attempting to read too much data.
To avoid this, I need to make sure that the size of Win32StreamID is in fact 20 and not 24. This can be accomplished in two different ways using fields on the StructLayoutAttribute that adorns the structure. The first is to use the Size field, which dictates to the runtime exactly how big the structure should be:
[StructLayout(LayoutKind.Sequential, Size = 20)]
The second option is to use the Pack field. Pack indicates the packing size that should be used when the LayoutKind.Sequential value is specified and controls the alignment of the fields within the structure. The default packing size for a managed structure is 8. If I change that to 4, I get the 20-byte structure I'm looking for (and as I'm not actually using this in an array, I don't lose efficiency or stability that might result from such a packing change):
[StructLayout(LayoutKind.Sequential, Pack=4)]
public struct Win32StreamID
{
  public StreamType dwStreamId;
  public int dwStreamAttributes;
  public long Size;
  public int dwStreamNameSize;
  // WCHAR cStreamName[1]; 
}
With this code in place, I can now enumerate all of the streams in a file, as shown here:
static void Main(string[] args) {
    foreach (string path in args) {
        Console.WriteLine(path + ":");
        foreach (StreamInfo stream in 
            FileStreamSearcher.GetStreams(new FileInfo(path)))
        {
            Console.WriteLine("\t{0}\t{1}\t{2}",
               stream.Name != null ? stream.Name : "(unnamed)",
               stream.Type, stream.Size);
        }
     }
}
You'll notice that this version of FileStreamSearcher returns more information than the version that uses FindFirstStreamW and FindNextStreamW. BackupRead can provide data on more than just the primary stream and Alternate Data Streams, also operating on streams containing security information, reparse data, and more. If you only want to see the Alternate Data Streams, you can filter based on the StreamInfo's Type property, which will be StreamType.AlternateData for Alternate Data Streams.
To test this code, you can create a file that has Alternate Data Streams using the echo command at the command prompt:
> echo ".NET Matters" > C:\test.txt
> echo "MSDN Magazine" > C:\test.txt:magStream
> StreamEnumerator.exe C:\test.txt
test.txt:
        (unnamed)               SecurityData    164
        (unnamed)               Data            17
        :magStream:$DATA        AlternateData   18
> type C:\test.txt
".NET Matters"
> more < C:\test.txt:magStream
"MSDN Magazine"
So, now you're able to retrieve the names of all Alternate Data Streams stored in a file. Great. But what if you want to actually manipulate the data in one of those streams? Unfortunately, if you attempt to pass a path for an Alternate Data Stream to one of the FileStream constructors, a NotSupportedException will be thrown: "The given path's format is not supported."
To get around this, you can bypass FileStream's path canonicalization checks by directly accessing the CreateFile function exposed from kernel32.dll (see Figure 3). I've used a P/Invoke for the CreateFile function to open and retrieve a SafeFileHandle for the specified path, without performing any of the managed permission checks on the path, so it can include Alternate Data Stream identifiers. This SafeFileHandle is then used to create a new managed FileStream, providing the required access. With that in place, it's easy to manipulate the contents of an Alternate Data Stream using the System.IO namespace's functionality. The following example reads and prints out the contents of the C:\test.txt:magStream created in the previous example:
string path = @"C:\test.txt:magStream";
using (StreamReader reader = new StreamReader(CreateFileStream(
    path, FileAccess.Read, FileMode.Open, FileShare.Read)))
{
    Console.WriteLine(reader.ReadToEnd());
}

Send your questions and comments to  netqa@microsoft.com.


Stephen Toub is the Technical Editor for MSDN Magazine.

Page view tracker