November 2015

Volume 30 Number 12

Windows 10 - Accelerate File Operations with the Search Indexer

By Adam Wilson | November 2015

The search indexer has been a part of Windows for many releases now, powering everything from the library views in File Explorer to the IE address bar, as well as providing search functionality for the Start menu and Outlook. With Windows 10, the power of the indexer has gone from being limited to desktop machines, to being available to all Universal Windows Platform (UWP) apps. While this also lets Cortana run better searches, the most exciting part of this advancement is that it greatly improves how apps inter­act with the file system.

The indexer lets apps do more interesting operations such as sorting and grouping files and tracking how the file system is changing. Most of the indexer APIs are available to UWP apps through the Windows.Storage and Windows.Storage.Search namespaces. Apps are already using the indexer to enable great experiences for their users. In this article, I’ll walk through how you can use the indexer to track changes on the file system, render views quickly and offer some basic tips on how to improve an app’s queries.

Accessing Files and Metadata Quickly

Most user devices contain hundreds or thousands of media files that include a user’s most cherished pictures and favorite songs. Apps that can quickly iterate through the files on the device and offer stimulating interactions with the files are among the most-loved apps on any platform. The UWP provides a series of classes that can be used to access files on any device, regardless of the form factor.

The Windows.Storage namespace includes the basic classes for accessing files and folders, as well as the base operations that most apps do with them. But if your app needs to access a lot of files or metadata then these classes won’t provide the performance characteristics that users demand.

For example, calling StorageFolder.GetFilesAsync is a recipe for disaster if you don’t control the folder you’re enumerating. Users can put billions of files in a single directory, but attempting to create StorageFile objects for each of them will cause an app to run out of memory very quickly. Even in less extreme cases the call will still return very slowly because the system has to create thousands of file handles and marshal them back into the app container. To help apps avoid this pitfall, the system provides the StorageFileQueryResults and StorageFolderQueryResults classes.

StorageFileQueryResults is the go-to class whenever writing an app to handle more than a trivial number of files. Not only does it provide a neat way to enumerate and modify the results of a complex search query, but because the API treats an enumeration request as a query for “*,” it also works for more mundane cases.

Using the indexer where available is the first step to speeding up your app. Now, coming from the indexer program manager, that sounds like a self-serving plea to keep me employed, but there’s a logical reason I say that. The StorageFile and StorageFolder objects were designed with the indexer in mind. The properties cached in the object can be retrieved quickly from the indexer. If you aren’t using the indexer, the system has to look up values from the disk and registry, which is I/O-intensive and causes performance issues for both the app and the system.

To make sure the indexer is going to be used, create a Query­Options object and set QueryOptions.IndexerOption property to either only use the indexer:

QueryOptions options = new QueryOptions();
options.IndexerOption = IndexerOption.OnlyUseIndexer;

or, use the indexer when it’s available:

options.IndexerOption = IndexerOption.UseIndexerWhenAvailable;

The recommended usage is for cases where a slow file operation won’t lock up your app or cripple the UX to use IndexerOption.Use­IndexerWhenAvailable. This will attempt to use the indexer to enumerate files, but fall back to the much slower disk operations, if needed. IndexerOption.OnlyUseIndexer is best used when returning no results is better than waiting for a slow file operation. The system will return zero results if the indexer is disabled, but will return quickly, still letting apps be reactive to the user.

There are times that creating a QueryOptions object seems a little excessive for just a quick enumeration and, in those cases, it might make sense to not worry if the indexer is present. For cases where you control the contents of the folder, calling StorageFolder.GetItemsAsync makes sense. It’s an easy line of code to write and any perf issues will be hidden in cases where there are only a few files in the directory.

Another way to speed up file enumeration is to not create unnec­essary StorageFile or StorageFolder objects. Even when using the indexer, opening a StorageFile requires the system to create a file handle, gather some property data, and marshal it into the app’s process. This IPC comes with inherent delays, which can be avoided in many cases by simply not creating the objects in the first place.

An important thing to note, a StorageFileQueryResult object backed by the indexer doesn’t create any StorageFiles internally. They’re created on demand when requested through GetFilesAsync. Until that time, the system only keeps a list of the files in memory, which is comparatively lightweight.

The recommended way to enumerate a large number of files is to use the batching functionality on GetFilesAsync to page in groups of files as they’re needed. This way, your app can do background processing on the files while it’s waiting for the next set to be created. The code in Figure 1 shows how it’s done in a simple example.

Figure 1 GetFilesAsync

uint index = 0, stepSize = 10;
IReadOnlyList<StorageFile> files = await queryResult.GetFilesAsync(index, stepSize);
index += 10;          
while (files.Count != 0)
{
  var fileTask = queryResult.GetFilesAsync(index, stepSize).AsTask();
  foreach (StorageFile file in files)
  {
    // Do the background processing here   
  }
  files = await fileTask;
  index += 10;
}

This is the same coding pattern that has been used by a number of apps already on Windows. By varying the step size they’re able to pull out the right number of items to have a responsive first view in the app, while quickly prepping the rest of the files in the background.

Property prefetching is another easy way to speed up your app. Property prefetching lets your app notify the system that it’s interested in a given set of file properties. The system will grab those properties from the indexer while it’s enumerating a set of files and cache them in the StorageFile object. This provides an easy performance boost versus just gathering the properties piecemeal when the files are returned.

Set the property prefetch values in the QueryOptions object. A few common scenarios are supported by using the Property­PrefetchOptions, but apps are also able to customize the properties requested to any values supported by Windows. The code to do this is simple:

QueryOptions options = new QueryOptions();
options.SetPropertyPrefetch(PropertyPrefetchOptions.ImageProperties,
  new String[] { });

In this case, the app uses the image properties and doesn’t need any other properties. When the system is enumerating the results of the query, it will cache the image properties in memory so they’re quickly available later.

One last note is that the property has to be stored in the index for the prefetching to offer a performance gain; otherwise, the system will still have to access the file to find the value, which is comparatively very slow. The Microsoft Windows Dev Center page for the property system (bit.ly/1LuovhT) has all the information about the properties available on Windows indexer. Just look for isColumn = true in the property description, which will indicate the property is available to be prefetched.

Bringing all of these improvements together lets your code run much faster. For a simple example, I wrote an app that retrieves all the pictures on my computer along with their vertical height. This is the first step that a photo-viewing app would have to take in order to display the user’s photo collection.

I did three runs to try out different styles of file enumeration and to show the differences between them. The first test used naïve code with the indexer enabled as shown in Figure 2. The second test used the code shown in Figure 3, which does property prefetching and pages in files. And the third is using property prefetching and paging files in, but with the indexer disabled. This code is the same as in Figure 3, but with one line changed as noted in the comments.

Figure 2 Naïve Code Enumerating a Library

StorageFolder folder = KnownFolders.PicturesLibrary;
QueryOptions options = new QueryOptions(
  CommonFileQuery.OrderByDate, new String[] { ".jpg", ".jpeg", ".png" });          
options.IndexerOption = IndexerOption.OnlyUseIndexer;
StorageFileQueryResult queryResult = folder.CreateFileQueryWithOptions(options);
Stopwatch watch = Stopwatch.StartNew();          
IReadOnlyList<StorageFile> files = await queryResult.GetFilesAsync();
foreach (StorageFile file in files)
{                
  IDictionary<string, object> size =
    await file.Properties.RetrievePropertiesAsync(
    new String[] { "System.Image.VerticalSize" });
  var sizeVal = size["System.Image.VerticalSize"];
}           
watch.Stop();
Debug.WriteLine("Time to run the slow way: " + watch.ElapsedMilliseconds + " ms");

Figure 3 Optimized Code for Enumerating a Library

StorageFolder folder = KnownFolders.PicturesLibrary;
QueryOptions options = new QueryOptions(
  CommonFileQuery.OrderByDate, new String[] { ".jpg", ".jpeg", ".png" });
// Change to DoNotUseIndexer for trial 3
options.IndexerOption = IndexerOption.OnlyUseIndexer;
options.SetPropertyPrefetch(PropertyPrefetchOptions.None, new String[] { "System.Image.VerticalSize" });
StorageFileQueryResult queryResult = folder.CreateFileQueryWithOptions(options);
Stopwatch watch = Stopwatch.StartNew();
uint index = 0, stepSize = 10;
IReadOnlyList<StorageFile> files = await queryResult.GetFilesAsync(index, stepSize);
index += 10;
// Note that I'm paging in the files as described
while (files.Count != 0)
{
  var fileTask = queryResult.GetFilesAsync(index, stepSize).AsTask();
  foreach (StorageFile file in files)
  {
// Put the value into memory to make sure that the system really fetches the property
    IDictionary<string,object> size =
      await file.Properties.RetrievePropertiesAsync(
      new String[] { "System.Image.VerticalSize" });
    var sizeVal = size["System.Image.VerticalSize"];                   
  }
  files = await fileTask;
  index += 10;
}
watch.Stop();
Debug.WriteLine("Time to run: " + watch.ElapsedMilliseconds + " ms");

Taking a look at the results with and without the prefetch, the performance difference is really clear, as shown in Figure 4.

Figure 4 Results with and Without Prefetch

Test Case (2,600 Images on a Desktop Machine) Average Run Time More Than 10 Samples
Naïve code + indexer 9,318ms
All optimizations + indexer 5,796ms
Optimizations + no indexer 20,248ms (48,420ms cold)

There’s a chance to almost double the performance of naïve code by applying the simple optimizations outlined here. The patterns are battle-tested, as well. Before releasing any version of Windows we work with the app teams to make sure Photos, Groove Music and others are working as well as possible. That is where these patterns came from; they were cribbed directly from the code of the first UWP apps on the UWP and can be applied directly to your apps.

Tracking Changes in the File System

As shown here, enumerating all the files in a location is a resource-­intensive process. Most of the time, your users aren’t even going to be interested in older files. They want the picture that they just took, the song that was just downloaded, or the most recently edited document. To help bring the most recent files to the top, your app can track changes to the file system and find the most recently created or modified files easily.

There are two methods for change tracking depending on if your app is in the background or foreground. When an app is in the foreground, it can use the ContentsChanged event from a StorageFileQueryResult object to be notified of changes under a given query. When an app is in the background, it can register for the StorageLibraryContentChangedTrigger in order to be notified when something changed. Both of these are poke notifications to let an app know that something changed, but don’t include information about the files that have changed.

To find which files have been modified or created recently, the system provides the System.Search.GatherTime property. The property is set for all files in an indexed location and tracks the last time that the indexer noticed a modification to the file. This property will be constantly updated based off the system clock, though, so the usual disclaimers involving time zone switches, daylight saving and users manually changing the system time still apply to trusting this value in your app.

Registering for a change tracking event in the foreground is easy. Once you’ve created a StorageFileQueryResult object covering the scope that your app is interested in, simply register for the ContentsChanged event, as shown here:

StorageFileQueryResult resultSet = photos.CreateFileQueryWithOptions(option);
resultSet.ContentsChanged += resultSet_ContentsChanged;

The event is going to be fired any time that something in the result set changes. The app can then find the file or files that changed recently.

Tracking changes from the background is slightly more involved. Apps are able to register to be notified when a file changes under a library on the device. There’s no support for more complex queries or scopes, which means that apps are responsible for doing a bit of work to make sure that the change is something in which they’re really interested.

As an interesting aside, the reason apps can only register for the library change notifications and not based on file type is due to how the indexer is designed. Filtering queries based on the location of a file on disk is much faster than querying for a match based on the file type; and, it dragged down the performance of devices in our initial tests. I will cover more performance tips later, but this is an important one to remember: Filtering query results by file location is extremely fast compared to other types of filtering operations.

I’ve outlined the steps to register for a background task with code samples in a blog post (bit.ly/1iPUVIo), but let’s walk through a couple of the more interesting steps here. The first thing an app must do is create the background trigger:

StorageLibrary library =
  await StorageLibrary.GetLibraryAsync(KnownLibraryId.Pictures);
StorageLibraryContentChangedTrigger trigger =
  StorageLibraryContentChangedTrigger.Create(library);

The trigger can also be created from a collection of libraries if the app is going to be interested in tracking multiple locations. In this case, the app is only going to be looking at the pictures library, which is one of the most common scenarios for apps. You need to be sure the app has the correct capabilities to be able to access the library it’s trying to change track; otherwise, the system will return an access denied exception when the app tries to create the StorageLibrary object.

On Windows mobile devices, this is especially powerful as the system guarantees new pictures from the device are going to be written under the pictures library location. This is done no matter what the user chooses in the settings page by changing which folders are included as part of the library.

The app must register the background task using the Background­ExecutionManager and have a background task built into the app. The background task could be activated while the app is in the foreground, so any code must be aware of potential race conditions on file or registry access.

Once the registration is done, your app is going to be called every time there’s a change under the library for which they’re registered. This might include files that your app isn’t interested in or can’t process. In that case, applying a restrictive filter as soon as the background task is triggered is the best way to make sure there’s no wasted background processing.

Finding the most recently modified or added files is as easy as a single query against the indexer. Simply request all the files with a gather time falling in the range in which the app is interested. The same sorting and grouping features available for other queries can be used here, as well, if desired. Be aware that the indexer uses Zulu time internally, so make sure to convert all the time strings to Zulu before using them. Here’s how a query can be built:

QueryOptions options = new QueryOptions();
DateTimeOffset lastSearchTime = DateTimeOffset.UtcNow.AddHours(-1);
// This is the conversion to Zulu time, which is used by the indexer
string timeFilter = "System.Search.GatherTime:>=" +
  lastSearchTime.ToString("yyyy\\-MM\\-dd\\THH\\:mm\\:ss\\Z")
options.ApplicationSearchFilter += timeFilter;

In this case the app is going to get all the results from the past hour. In some apps it makes more sense to save the time of the last query and use it in the query instead, but any DateTimeOffset will work. Once the app has the list of files back it can enumerate them as discussed earlier or use the list to track which files are new to it.

Combining the two methods of change tracking with the gather time allows UWP apps the ability to change track the file system and react to changes on the disk with ease. These may be relatively new APIs in the history of Windows, but they’ve been used in the Photos, Groove Music, OneDrive, Cortana, and Movies & TVs apps built into Windows 10. You can feel confident including them in your app knowing that they’re powering these great experiences.

General Best Practices

There are a few things that any app using the indexer should be aware of to avoid any bugs and make sure the app is as fast as possible. They include avoiding any unnecessarily complex queries in performance-critical parts of the app, using properties enumerations correctly and being aware of indexing delays.

How a query is designed can have a significant impact on its performance. When the indexer is running queries against its backing database, some queries are faster because of how the information is laid out on the disk. Filtering based off file location is always fast as the indexer is able to quickly eliminate massive parts of the index from the query. This saves processor and I/O time because there are fewer strings that need to be retrieved and compared while searching for matches to query terms.

The indexer is powerful enough to handle regular expressions, but some forms of them are notorious for causing slowdowns. The worst thing that can be included in an indexer query is a suffix search. This is a query for all terms ending with a given value. An example would be the query “*tion,” which looks for all documents containing words ending with “tion.” Because the index is sorted by the  first letter in each token, there’s no fast way to find terms matching this query. It has to decode every token in the entire index and compare it to the search term, which is extremely slow.

Enumerations can accelerate queries but have unexpected behavior in international builds. Anyone who has built a search system knows how much faster doing comparisons based off an enumeration is than doing string comparisons. And this is true in the indexer, as well. To make it easy for your app, the property system provides a number of enumerations to filter results to fewer items before starting the costly string comparisons. A common example of this is to use the System.Kind filter to restrict the result to just file kinds that an app can handle, such as music or documents.

There’s a common error of which anyone using enumerations must be aware. If your user is going to be looking for only music files, then in an en-us version of Windows adding System.Kind:=music to the query is going to work perfectly to restrict the search results and speed up a query. This will also work in some other languages—possibly even enough to pass internationalization testing—but it will fail where the system isn’t able to recognize “music” as an English term and instead parses it in the local language.

The correct way to use an enumeration such as System.Kind is to clearly denote that the app is intending to use the value as an enumeration, not as a search term. This is done by using the enumeration#value syntax. For example, the correct way to filter to just music results would be to write System.Kind:=System.Kind#Music. This will work in all the languages that Windows ships and will filter the results to just files the system recognizes as music files.

Correctly escaping Advanced Query Syntax (AQS) can help make sure your users don’t hit hard to repro query issues. AQS has a number of features that let users include quotes or parentheses to affect how the query is processed. This means that apps have to be careful to escape any query terms that might include these characters. For example, searching for Document(8).docx is going to result in a parsing error and incorrect results being returned. Instead the app should escape the term as Document%288%29.docx. This will return items in the index that match the search term, rather than having the system try to parse the parenthesis as a part of the query.

For the most in-depth coverage of all the different features of AQS and how to make sure your queries are correct, you can check out the documentation at bit.ly/1Fhacfl. It has a lot of great information, including more details about the tips mentioned here.

A note about indexing delays: Indexing isn’t instant, which means that items appearing in the index or notifications based on the indexer are going to be delayed slightly from when the file is written. Under normal system load, the delay is going to be on the order of 100ms, which is faster than most apps can query the file system, so it won’t be noticeable. There are cases where a user might be migrating thousands of files around their machine and the indexer is falling noticeably behind.

In these cases, there are two things apps are recommended to do: First, they should hold a query open over the file system locations in which the app is most interested. Typically, this is done by creating a StorageFileQueryResult object over the file system locations the app is going to be searching. When the indexer sees that an app has an open query, it’s going to prioritize indexing in those scopes over all other scopes. But please make sure not to do this for a larger scope than needed. The indexer is going to stop respecting system backoff and user-active notifications to process the changes as fast as it can, so users might notice an impact on system performance while this is happening.

The other recommendation is to warn the users that the system is catching up with the file operations. Some apps such as Cortana show a message at the top of their UI, whereas others will stop doing complex queries and show a simple version of the experience. It’s up to you to determine what’s best for your app experience.

Wrapping Up

This has been a quick tour of the features that are available for consumers of the indexer and Windows Storage APIs in Windows 10. For more information about how to use queries to pass context on app activation or code samples for the background trigger, check out the team’s blog at bit.ly/1iPUVIo. We’re constantly working with developers to make sure that the searching APIs are great to use. We would love to hear your feedback about what’s working and what you’d like to see added to the surface.


Adam Wilson is a program manager on the Windows Developer Ecosystem and Platform team. He works on the Windows indexer and push notification reliability. Previously he worked on the storage APIs for Windows Phone 8.1. Reach him at adwilso@microsoft.com.

Thanks to the following technical expert for reviewing this article: Sami Khoury
Sami Khoury is an engineer on the Windows Developer Ecosystem and Platform team. He leads the development of the Windows Indexer