Export (0) Print
Expand All

How to: Iterate File Directories with PLINQ

This example shows two simple ways to parallelize operations on file directories. The first query uses the GetFiles method to populate an array of file names in a directory and all subdirectories. This method does not return until the entire array is populated, and therefore it can introduce latency at the beginning of the operation. However, after the array is populated, PLINQ can process it in parallel very quickly.

The second query uses the static EnumerateDirectories and EnumerateFiles methods which begin returning results immediately. This approach can be faster when you are iterating over large directory trees, although the processing time compared to the first example can depend on many factors.

Caution note Caution

These examples are intended to demonstrate usage, and might not run faster than the equivalent sequential LINQ to Objects query. For more information about speedup, see Understanding Speedup in PLINQ.

The following example shows how to iterate over file directories in simple scenarios when you have access to all directories in the tree, the file sizes are not very large, and the access times are not significant. This approach involves a period of latency at the beginning while the array of file names is being constructed.


struct FileResult
{
    public string Text;
    public string FileName;
}
// Use Directory.GetFiles to get the source sequence of file names. 
public static void FileIteration_1(string path)
{       
    var sw = Stopwatch.StartNew();
    int count = 0;
    string[] files = null;
    try
    {
        files = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories);
    }
    catch (UnauthorizedAccessException e)
    {
        Console.WriteLine("You do not have permission to access one or more folders in this directory tree.");
        return;
    }

    catch (FileNotFoundException)
    {
        Console.WriteLine("The specified directory {0} was not found.", path);
    }

    var fileContents = from file in files.AsParallel()
            let extension = Path.GetExtension(file)
            where extension == ".txt" || extension == ".htm" 
            let text = File.ReadAllText(file)
            select new FileResult { Text = text , FileName = file }; //Or ReadAllBytes, ReadAllLines, etc.               

    try
    {
        foreach (var item in fileContents)
        {
            Console.WriteLine(Path.GetFileName(item.FileName) + ":" + item.Text.Length);
            count++;
        }
    }
    catch (AggregateException ae)
    {
        ae.Handle((ex) =>
            {
                if (ex is UnauthorizedAccessException)
                {
                   Console.WriteLine(ex.Message);
                   return true;
                }
                return false;
            });
    }

    Console.WriteLine("FileIteration_1 processed {0} files in {1} milliseconds", count, sw.ElapsedMilliseconds);
    }

The following example shows how to iterate over file directories in simple scenarios when you have access to all directories in the tree, the file sizes are not very large, and the access times are not significant. This approach begins producing results faster than the previous example.


struct FileResult
{
    public string Text;
    public string FileName;
}

// Use Directory.EnumerateDirectories and EnumerateFiles to get the source sequence of file names. 
public static void FileIteration_2(string path) //225512 ms
{
    var count = 0;
    var sw = Stopwatch.StartNew();
    var fileNames = from dir in Directory.EnumerateFiles(path, "*.*", SearchOption.AllDirectories)
                    select dir;


    var fileContents = from file in fileNames.AsParallel() // Use AsOrdered to preserve source ordering 
                       let extension = Path.GetExtension(file)
                       where extension == ".txt" || extension == ".htm" 
                       let Text = File.ReadAllText(file)
                       select new { Text, FileName = file }; //Or ReadAllBytes, ReadAllLines, etc. 
    try
    {
        foreach (var item in fileContents)
        {
            Console.WriteLine(Path.GetFileName(item.FileName) + ":" + item.Text.Length);
            count++;
        }
    }
    catch (AggregateException ae)
    {
        ae.Handle((ex) =>
            {
                if (ex is UnauthorizedAccessException)
                {
                   Console.WriteLine(ex.Message);
                   return true;
                }
                return false;
            });
    }

    Console.WriteLine("FileIteration_2 processed {0} files in {1} milliseconds", count, sw.ElapsedMilliseconds);
}

When using GetFiles, be sure that you have sufficient permissions on all directories in the tree. Otherwise an exception will be thrown and no results will be returned. When using the EnumerateDirectories in a PLINQ query, it is problematic to handle I/O exceptions in a graceful way that enables you to continue iterating. If your code must handle I/O or unauthorized access exceptions, then you should consider the approach described in How to: Iterate File Directories with the Parallel Class.

If I/O latency is an issue, for example with file I/O over a network, consider using one of the asynchronous I/O techniques described in TPL and Traditional .NET Framework Asynchronous Programming and in this blog post.

Show:
© 2014 Microsoft