Export (0) Print
Expand All

How to: Query for the Largest File or Files in a Directory Tree (LINQ)

This example shows five queries related to file size in bytes:

  • How to retrieve the size in bytes of the largest file.

  • How to retrieve the size in bytes of the smallest file.

  • How to retrieve the FileInfo object largest or smallest file from one or more folders under a specified root folder.

  • How to retrieve a sequence such as the 10 largest files.

  • How to order files into groups based on their file size in bytes, ignoring files that are less than a specified size.

The following example contains five separate queries that show how to query and group files, depending on their file size in bytes. You can easily modify these examples to base the query on some other property of the FileInfo object.

class QueryBySize
{
    static void Main(string[] args)
    {
        QueryFilesBySize();
        Console.WriteLine("Press any key to exit");
        Console.ReadKey();
    }

    private static void QueryFilesBySize()
    {
        string startFolder = @"c:\program files\Microsoft Visual Studio 9.0\";

        // Take a snapshot of the file system.
        System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);

        // This method assumes that the application has discovery permissions 
        // for all folders under the specified path.
        IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);

        //Return the size of the largest file 
        long maxSize =
            (from file in fileList
             let len = GetFileLength(file)
             select len)
             .Max();

        Console.WriteLine("The length of the largest file under {0} is {1}",
            startFolder, maxSize);

        // Return the FileInfo object for the largest file 
        // by sorting and selecting from beginning of list
        System.IO.FileInfo longestFile =
            (from file in fileList
             let len = GetFileLength(file)
             where len > 0
             orderby len descending 
             select file)
            .First();

        Console.WriteLine("The largest file under {0} is {1} with a length of {2} bytes",
                            startFolder, longestFile.FullName, longestFile.Length);

        //Return the FileInfo of the smallest file
        System.IO.FileInfo smallestFile =
            (from file in fileList
             let len = GetFileLength(file)
             where len > 0
             orderby len ascending 
             select file).First();

        Console.WriteLine("The smallest file under {0} is {1} with a length of {2} bytes",
                            startFolder, smallestFile.FullName, smallestFile.Length);

        //Return the FileInfos for the 10 largest files 
        // queryTenLargest is an IEnumerable<System.IO.FileInfo> 
        var queryTenLargest =
            (from file in fileList
             let len = GetFileLength(file)
             orderby len descending 
             select file).Take(10);

        Console.WriteLine("The 10 largest files under {0} are:", startFolder);

        foreach (var v in queryTenLargest)
        {
            Console.WriteLine("{0}: {1} bytes", v.FullName, v.Length);
        }


        // Group the files according to their size, leaving out 
        // files that are less than 200000 bytes.  
        var querySizeGroups =
            from file in fileList
            let len = GetFileLength(file)
            where len > 0
            group file by (len / 100000) into fileGroup
            where fileGroup.Key >= 2
            orderby fileGroup.Key descending 
            select fileGroup;


        foreach (var filegroup in querySizeGroups)
        {
            Console.WriteLine(filegroup.Key.ToString() + "00000");
            foreach (var item in filegroup)
            {
                Console.WriteLine("\t{0}: {1}", item.Name, item.Length);
            }
        }
    }

    // This method is used to swallow the possible exception 
    // that can be raised when accessing the FileInfo.Length property. 
    // In this particular case, it is safe to swallow the exception. 
    static long GetFileLength(System.IO.FileInfo fi)
    {
        long retval;
        try
        {
            retval = fi.Length;
        }
        catch (System.IO.FileNotFoundException)
        {
            // If a file is no longer present, 
            // just add zero bytes to the total.
            retval = 0;
        }
        return retval;
    }

}

To return one or more complete FileInfo objects, the query first must examine each one in the data source, and then sort them by the value of their Length property. Then it can return the single one or the sequence with the greatest lengths. Use First to return the first element in a list. Use Take<TSource> to return the first n number of elements. Specify a descending sort order to put the smallest elements at the start of the list.

The query calls out to a separate method to obtain the file size in bytes in order to consume the possible exception that will be raised in the case where a file was deleted on another thread in the time period since the FileInfo object was created in the call to GetFiles. Even through the FileInfo object has already been created, the exception can occur because a FileInfo object will try to refresh its Length property by using the most current size in bytes the first time the property is accessed. By putting this operation in a try-catch block outside the query, we follow the rule of avoiding operations in queries that can cause side-effects. In general, great care must be taken when consuming exceptions, to make sure that an application is not left in an unknown state.

  • Create a Visual Studio project that targets the .NET Framework version 3.5. The project has a reference to System.Core.dll and a using directive (C#) or Imports statement (Visual Basic) for the System.Linq namespace by default.

  • Copy this code into your project.

  • Press F5 to compile and run the program.

  • Press any key to exit the console window.

For intensive query operations over the contents of multiple types of documents and files, consider using the Windows Desktop Search engine.

Community Additions

ADD
Show:
© 2014 Microsoft