[Editor's Update - 8/27/2004: This content is out-of-date. Microsoft will deliver a Windows storage subsystem, code-named "WinFS," after the "Longhorn" release. The new storage system provides advanced data organization and management capabilities and will be in beta testing when the "Longhorn" client becomes available. For more information, please refer to Microsoft Announces 2006 Target Date for Broad Availability Of Windows "Longhorn" Client Operating System.]

Code Name WinFS

Revolutionary File Storage System Lets Users Search and Manage Files Based on Content

Richard Grimes

This article was based on a pre-PDC build of Microsoft Windows Code Name "Longhorn" and all information contained herein is subject to change.

Note: This document was developed prior to the product's release to manufacturing and, as such, we cannot guarantee that any details included herein will be exactly the same as those found in the shipping product. The information represents the product at the time this document was printed and should be used for planning purposes only. Information subject to change at any time without prior notice.

SUMMARY

One of the monumental problems organizations face today is aggregating information that's stored in disparate formats. Knowledge workers have long wanted to be able to search for content independent of format. The next version of the Windows operating system, code-named "Longhorn," boasts a new storage subsystem that makes that task easier. That subsystem, code-named "WinFS," allows the user to perform searches based on the metadata of the stored item, regardless of what type of file it is or which application created it. This article covers the basic architecture of WinFS and explains how to use the WinFS managed API.

Contents

Basic Architecture
WinFS Storage
WinFS Types and .NET Types
The WinFS Managed API
Containing Items in Folders
Searching for Items
Notifications
Synchronization
WinFS and Win32
Conclusion

Every year, as new hard disks get bigger and faster, applications catch up by producing more data. Hard disks are commonly used to store personal information: correspondence, personal contacts, and work documents. These items are currently treated as separate entities, yet they are interrelated on some level; and it's no surprise that e-mail comes from your personal contacts list and influences the work that you should be doing and hence determines the documents that you'll create. When you have a large number of items, it is important to have a flexible and efficient mechanism to search for particular items based on their properties and content. Up until now, storage mechanisms like Outlook®, the Windows® Contact List, or Windows Media® Player media library used unrelated technologies for the storage of items and the mechanism to search for them. Well, those days are over, thanks to Microsoft® Windows code-named "Longhorn."

The Longhorn generic storage subsystem, code-named "WinFS," can store data of all types and use one mechanism to access it. This is a huge change in an OS that has relied on multiple, incompatible storage for data including the registry, event log messages, contact information, and e-mail, or simply used multiple flat files for data such as images and audio. Since WinFS is implemented using database technologies, you get an efficient and flexible mechanism to query for data items, the ability to replicate data to stores on other machines, and facilities to handle backups and restores.

In this article I will outline the basic architecture of WinFS and describe how data is stored, how it is retrieved, and the services it provides for the operating system. Like much of Longhorn, WinFS has its own managed API, which will be my focus.

Basic Architecture

The perennial problem with data is that you always have too much of it and it is always too difficult to find what you want. This problem will get far worse in the near future when typical machines will have hard disks measured in terabytes. The bigger the hard disk, the more data you will want to store, and the more difficult it will be to find a particular item. Finding a piece of data depends on identifying the specific item, which means that the item has to be categorized.

For example, imagine that you have several thousand digital photographs. Typically, your camera and desktop software will catalog this data by naming the photos in sequence and maybe place these in a folder according to the date that the photos were taken. Searching for a particular photo in this scheme involves remembering the date and photo number. If you do not know either of these things, you will have to resort to looking through thumbnails of the photos. Of course, if you have the patience, you could name each photograph as you download it to your desktop. A coarse-grained system, like the file and folder system in Win32®, is not flexible enough to allow you to store all the data you need.

Desktop software may fill this gap, allowing you to build a catalog that stores other descriptive properties like camera settings, light condition, and a general description of each picture. Such software, of course, is an implementation of a database, but the important point is that it acts as a supplement to the file system. You could argue that these descriptions of your photos—metadata—are actually part of the photograph's object. Digital photographs hold other information within the image file's EXIF (Exchangeable Image File Format) header. This header contains information obtained when the photo was taken. To include this metadata in a search would require a query engine to extract this information from each candidate photo and compare it with the query condition. A sophisticated photograph organizer could extract the EXIF information when the photograph is transferred to the desktop and include it as additional metadata in the database.

This is where WinFS fits in. Everything that is stored in the WinFS store is an item and each item has metadata properties that are described by a schema. Items that follow the schema are stored in the WinFS store as serialized .NET objects and are accessed through T-SQL views that give access to the items' properties. Finding a photo stored on your disk involves querying the store against the metadata properties for an Image item, the information obtained from the EXIF header, as well as file information like the creation date and the actual bits that make up the image.

Figure 1 Files in WinFS

Figure 1** Files in WinFS **

Figure 1 shows the conceptual architecture of items in WinFS. Each item exists in a containment folder; these are used to control the lifetime of items. Folders are items themselves and each folder must be contained within another folder. The only exception to this containment rule is the root containment folder. Items are stored in a volume, which is the largest autonomous container of items. On each machine, the WinFS service acts upon these volumes to store, delete, modify, and locate items.

WinFS Storage

WinFS provides T-SQL views of the Item tables, with one view per item type. To enforce the security of the store and the consistency of the data, the data can only be read through the views and can only be modified using stored procedures. For example, the following line will return the requested information for all tracks in the Media Library on your machine:

SELECT Artist, Title, Album FROM Audio.[Track!V1]

Some objects contain references to other objects. For example, a Track object has a member called Authors, which is a collection of Author objects, each of which has a DisplayName property, which gives the name of an artist who authored the track. It is possible to construct a query for all of the tracks that are attributed to a particular artist, which you can cross-reference to your personal address book to see if you have the cell phone number of any artist for which you have a track. (This assumes, of course, that you have a celebrity-filled address book!)

Not everyone is a database developer and able to construct complicated select queries, so WinFS provides several higher-level APIs to make inserting, deleting, and querying for items simpler.

WinFS Types and .NET Types

When you access WinFS items through the managed API, you do so through .NET objects. Each WinFS item type has a corresponding .NET Framework class which has System.Storage.Item in its hierarchy. The Item type has an ItemID_Key property and a GUID, which provides the identity for the WinFS item. When you obtain a WinFS object, the API returns the corresponding .NET type initialized with the metadata of the WinFS object. If you obtain two Item instances that have the same ItemID_Key values, then they refer to the same item in the store, even though you will have two different .NET references. When comparing two WinFS objects, it is important that you compare each of the values of the ItemID_Key of the items that are in question rather than using ReferenceEquals.

In WinFS parlance, a WinFS type is an item that is made up of elements that are either items, nested elements, or scalar types—or a collection of any of these. An item is the unit of consistency and coherence, so when an item is copied or deleted, all the elements within the item are also copied or deleted. A nested element can only be part of a single item. This behavior is different from the way it works in .NET. For example, a System.Storage.Contacts.Person object has a property called PrimaryName that is a nested element of type FullName. If you create two Person objects with the same PrimaryName (for example, a father and son who have the same name) then you should create two FullName objects, one for each Person object. If you use the same FullName object for both Person objects you will get an exception when you try to save the Person objects to the store because although they are valid .NET objects, they are not valid WinFS objects.

The WinFS Managed API

Figure 2 shows the general architecture of WinFS. The WinFS store can be accessed through several APIs, and in this release an app can use ADO.NET with the SQL Server™ Provider, OLE DB with native code, a COM-based API, and a managed API. In general, developers will use the APIs rather than T-SQL, but will still have access through T-SQL because there are a number of things that you can do in T-SQL that you cannot do with the other APIs.

Developers who are concentrating on .NET will use the WinFS API found in the System.Storage assembly, which contains the general WinFS API. Developers will also use the System.Storage.Schema assembly, which holds the API for the standard Longhorn types (Track, Image, Folder, Message, and so on). In addition to the common language runtime (CLR) types defined for the WinFS Items, the System.Storage.Schema assembly contains the data model mapping for the items. This mapping indicates to WinFS the data source used to access the item and the mapping from the properties of the managed type to the fields in the data store table. Deep in the bowels of the WinFS API, this mapping information is used to construct the T-SQL queries to access the item in the store.

Figure 2 WinFS Architecture

Figure 2** WinFS Architecture **

All WinFS API code has to run under an ItemContext. An ItemContext means many things. First, the WinFS API must create a connection to the store; this connection remains active while the ItemContext is active. Second, since the ItemContext represents a data connection, it provides transactional facilities. If you like, you can make several changes to the store as an atomic unit. Finally, an ItemContext describes the scope of the work that you will do by defining which items your searches will act upon.

The first action that you need to take is to open an ItemContext with the static Open method. The parameterless version of Open will open the context for all items, but you can limit the scope of the context with the other overloads. WinFS uses a naming scheme that allows you to identify both the volume and a backslash-delimited path to a folder.

The ItemContext provides access to the data connection through BeginTransaction and EndTransaction. BeginTransaction creates a new transaction and returns a Transaction object. Once you have done this, all the changes you make to the store will be performed under the influence of the transaction until you call EndTransaction and treat all of your changes to the store as an atomic action.

It is also worth pointing out that when you have finished accessing the store you should call Close on the ItemContext to release the data connection. If you have made changes to the store, you should also call Update to ensure that the changes have been persisted to the store. For this prerelease version of WinFS, once you have created a Folder object, you should save it before attempting to insert any items into the folder. To do this you can call Folder.Save, which is merely a wrapper around ItemContext.Save.

Containing Items in Folders

The broadest level of object categorization can be achieved through folders. WinFS folders are used to contain items; however, unlike the concept of a file system folder in Win32, a single item can exist in more than one WinFS folder. If you want to create your own folder hierarchy, start by adding your folder to an existing one. To do this, you create a new Folder object by passing the container folder object and the new folder name as constructor parameters. The containing folder can be any existing folder, and you can traverse the entire hierarchy by obtaining the root folder (just call Folder.GetRootFolder) and then iterating all the folder objects it contains. Typically, you will not store items in the top-level folder, but instead use a folder further down the hierarchy.

The root folder of each volume has four folders by default: System Data, Volume, Schema, and Store Information. The folders you will be creating should exist under SystemData. To do this, you could search for and then open the SystemData folder and add your folder under it. It is easier, though, to use the WinFS API to provide access to the folder that is allocated specifically for your items. The static method UserDataFolder.FindMyUserDataFolder will return a folder under SystemData specifically for the current user and some item types will have their own standard folder. For example, the UserDataFolder.FindMyPersonalContactsFolder method will return a folder called Personal Contacts.

Once you have access to the appropriate folder, you can populate it with items (including other folders) using the Folder.AddMember method. Removing items from a folder is just as simple—you call the Folder.RemoveMember method and pass either the item or the item's name. When you place an item in a folder, you are using what is known as a holding link. The folder object is just another item that has a holding link for each item it contains. A holding link is special because it determines the lifetime of the item it refers to, and each holding link on an item acts as a reference count. A holding link can only be applied to an item, and this highlights another difference between items and nested elements. When all the holding links on an item are deleted, so is the item. WinFS folders are not constrained to the type of items they contain. This means that if you have a folder called "Summer Vacation 2003," it can hold all the photos you took on your vacation, the e-mail messages confirming your booking, and even a WAV file of "Holiday" by Madonna.

Accessing items in folders is straightforward: the folder item has a property called Members, which is actually a collection of Link items. Each Link is the holding link and its SourceItem property provides access to the appropriate item. In addition, the Folder class has members that let you retrieve members that have a specific name or are of a particular type.

Searching for Items

Once you have a context and have determined the folder where the items are located, you next need to access the items. The simplest way to do this is to iterate through all the items in the folder and check each one individually, as shown in Figure 3. The FindResult object contains the results of the query performed by GetAllMembers, which should include all of the items in the folder. WinFS then creates a CLR object of type Item for each of these items and puts them in the FindResult object. Of course, GetAllMembers is a rather blunt tool, but you can refine this process by using GetAllMembersOfType to return only the Person items in the folder:

using (FindResult result = myDataFolder.GetAllMembersOfType(typeof(Person))) { foreach(Person person in result) { if (person.DisplayName == "Richard Grimes") { // use person here break; } } }

Figure 3 Searching for the Richard Grimes Person

using(ItemContext ctx = ItemContext.Open()) { Folder myDataFolder = UserDataFolder.FindMyPersonalContactsFolder(ctx); using (FindResult result = myDataFolder.GetAllMembers()) { foreach (object o in result) { Person person = o as Person; if (person == null) continue; if (person.DisplayName == "Richard Grimes") { // use person here break; } } } }

The search can be refined further through a filter passed to one of the Find methods:

Person person = (Person)myDataFolder.GetOneMemberOfType( typeof(Person), "DisplayName='Richard Grimes'");

This method does all of the work in the previous code. It searches a specific folder for objects of a specific type that also have the property values specified in the filter string. The filter string is written in a new query language called WinFS OPath. You can provide a filter to check the properties of the type passed to the method, so in this case I am performing a check on the Person.DisplayName property. During the search, WinFS will extract the DisplayName property of each Person object it finds and compare the value with the literal string I provide.

WinFS OPath is very flexible, allowing you to provide multiple criteria appended with the && or || operators. In addition, you can even use the like operator:

FindResult result = myDataFolder.GetAllMembersOfType( typeof(Person), "DisplayName like 'R% Grimes'");

This will return all the Person Items that have a DisplayName made up of a first name that starts with "R" and a surname "Grimes".

Some properties are collections, so WinFS OPath provides a syntax that allows you to check the properties on the items in the collection. For example, the Person class has a property called PersonalNames, which is a collection of FullName objects (allowing for the fact that some people may use more than one name). If I want to search for all people that have the last name Grimes as one or more of their personal names, I can use:

FindResult result = Person.FindAll( ctx, "PersonalNames[Surname='Grimes']");

Here, the square brackets indicate that PersonalNames is a collection and that for each item in the collection the Surname property should be compared with the literal string. In this example I have used the static method, Person.FindAll, which will find all the Person items in the given context. All item classes will have an overloaded FindAll inherited from the Item class. In addition, all item classes will have a FindOne method that will return a single typed item from the context that satisfies the filter.

Item classes can perform searches for items based on the item's ID, or can search for items that have a specific category. Categories are essentially known names attached to an item. The Item class has a property called Categories, which is a collection of all the categories attached to the item, and each category is a CategoryRef object. Categories have a Key property, which is the name of the category, and also a Type property, which is a GUID identifying the category. You can create your own categories, or you can use the predefined categories provided as static members of the GeneralCategories type. To add a category to an item, you have to create a new CategoryRef object and add this to the Categories property. You cannot simply add one of the members of the GeneralCategories to an item. The code in Figure 4 will fail at run time because in the store, members of Categories are nested elements, which are owned by a single item and cannot be shared between items. A nested element has an ItemKey property which has a unique value for each nested element. However, in Figure 4, the WinFS code will attempt to add a nested element with the same ItemKey to two items. The solution in Figure 5 is straightforward: a new CategoryRef object is added to the Categories property and this is initialized with a value from the GeneralCategories class. Since a new CategoryRef object is created, it will have a unique ItemKey.

Figure 5 Correct Way to Add a Category

Person person = Person.CreatePersonalContact(ctx); person.DisplayName = "Richard Edwards"; SmtpEmailAddress email; email = new SmtpEmailAddress("dai@parkfarm.com"); email.Categories.Add(new CategoryRef(GeneralCategories.Home)); person.PersonalEmailAddresses.Add(email); person.Save(); person = Person.CreatePersonalContact(ctx); person.DisplayName = "Roberto Marchini"; email = new SmtpEmailAddress("rob@robshouse.com"); email.Categories.Add(new CategoryRef(GeneralCategories.Home)); person.PersonalEmailAddresses.Add(email); person.Save();

Figure 4 Incorrect Way to Add a Category

Person person = Person.CreatePersonalContact(ctx); person.DisplayName = "Richard Edwards"; SmtpEmailAddress email; email = new SmtpEmailAddress("dai@parkfarm.com"); email.Categories.Add(GeneralCategories.Home); person.PersonalEmailAddresses.Add(email); person.Save(); person = Person.CreatePersonalContact(ctx); person.DisplayName = "Roberto Marchini"; email = new SmtpEmailAddress("rob@robshouse.com"); email.Categories.Add(GeneralCategories.Home); person.PersonalEmailAddresses.Add(email); person.Save();

If you know that your items have categories, you can use FindByAllCategories, FindByAnyCategory, or FindByCategory to find the items that have all of the specified categories, any one of the specified categories, or one specific category.

Notifications

When you use WinFS, it is often too easy to forget that the items you are accessing are actually items in a data store. WinFS notifications make this apparent. As the name suggests, WinFS notifications are a mechanism to allow your code to be informed when an item changes. WinFS uses the .NET events model to subscribe to and handle notifications. Subscriptions will survive until the application instance quits or until the application unsubscribes. The ItemWatcher object can be created to monitor a particular item, and this object has an event called ItemChanged:

public event ItemChangedEventHandler ItemChanged;

This delegate has an ItemChangedEventArgs parameter. The ItemChangedEventArgs.Details is a collection of ItemChangedDetail objects that indicates the type of change that has occurred on the item. So, one application can access an item from the store and add an ItemChanged handler for that item. Then, when another application makes a change to the item, the event in the first application is triggered.

The listener application only needs to add a delegate to the ItemWatcher.ItemChanged event, and the application will be notified whenever that item has changed. That change may be from code in the same process, in another process, or on another machine. If the change occurs from another machine, then it manifests itself on your machine due to synchronization.

Synchronization

As I mentioned, WinFS is a data storage service, so it seems natural for it to support replication to other stores on other machines: to WinFS this is called synchronization. To implement synchronization, WinFS first has to determine what data has changed and then determine what data needs to be replicated. In WinFS, elements are the smallest parts of an item's schema that are tracked. If one element in an item changes, WinFS knows that the item has changed. Items are the smallest unit of consistency in WinFS and so they are the smallest unit of data that can be replicated to another machine—an item is replicated in its entirety or not at all.

The user has to identify the machines that will take part in the synchronization. To do this, the user creates a community folder that is used by all the machines and associates it with the actual folder of items on the local machine. Applications can initiate synchronization through the WinFS API or it may occur as a scheduled task. In both cases, the application that initiated the action provides a synchronization profile to WinFS. This profile has information about the folder that is being synchronized, the direction (send-only, receive-only, or send and receive) and information about what items should be synchronized. In addition, the profile contains information about how to handle conflicts. Conflict policy can invoke an action as simple as rejecting the replication, which can be handled automatically by a piece of code called a conflict resolver, or it can solicit user intervention.

WinFS and Win32

WinFS is an exciting and feature-rich facility, and developers writing new applications for Longhorn should take full advantage of it. Of course, most applications that are moved over to Longhorn will have some legacy Win32 code and so this raises the question of how the two will cooperate.

The first point to make is that WinFS will not supplant NTFS—machines running Longhorn will still have NTFS-based drives. WinFS currently only applies to folders within the Documents and Settings folders. All other folders still reside under the control of NTFS. In particular, %systemroot% and Program Files are NTFS folders, which means that your application executable files will not actually live in the WinFS store.

Of course, a Win32-based application can still read and write files in the folders under Documents and Settings. The Win32 file APIs have been implemented in Longhorn to take advantage of the items in the WinFS store. All of the items in Documents and Settings that were migrated from Win32 are file-backed, which means that there will be a file representation of the item as well as a WinFS item. Known file types will each have an associated metadata handler that has the responsibility to read the file and extract the metadata to store as properties in WinFS. For example, a metadata handler associated with Microsoft Word will load a Word file and then read the SummaryInformation OLE Property set and use the information in order to initialize a Document WinFS type in the WinFS store.

This duality of file-backed items and metadata properties in the store means that Win32 can continue to access items in Documents and Settings and WinFS can use properties as part of a query. These properties are treated as read-only to WinFS. When a Win32-based application changes a file, the metadata handler updates the store with the new property values. What this means in practice is that when you copy a Word document to your My Documents folder on a machine running WinFS, information about the file will be put into the WinFS store so that applications can perform searches based on the file's properties. However, the file will still be accessible by Win32, so Word will be able to load and edit the file. When Win32 makes changes to the file, WinFS will read the updated properties and write them into the store. So when you add a page to your Word document and save the file, WinFS will update the Document.DocPageCount property so that searches based on this property will work as expected.

Conclusion

WinFS, the new Longhorn file storage system, is the basic building block of all applications on Longhorn, and I have only just scratched the surface of what you can do with it in this article. WinFS provides you with all of the features you need to have persistent items that can be categorized and searched for based on properties of the item type. Your job is to use these facilities to create the next generation of Windows-based applications.

Richard Grimes speaks and writes about the .NET Framework. He is the author of Programming with Managed Extensions for Microsoft Visual C++ .NET (Microsoft Press, 2002) and Developing Applications with Visual Studio .NET (Addison-Wesley, 2002). His book on developing applications for Longhorn will be available nearer the release of Longhorn. He can be contacted at longhorn.dev@grimes.demon.co.uk.