How to: Query for the Largest File or Files in a Directory Tree (LINQ)

This example shows five queries related to file size in bytes:

  • How to retrieve the size in bytes of the largest file.

  • How to retrieve the size in bytes of the smallest file.

  • How to retrieve the FileInfo object largest or smallest file from one or more folders under a specified root folder.

  • How to retrieve a sequence such as the 10 largest files.

  • How to order files into groups based on their file size in bytes, ignoring files that are less than a specified size.

Example

The following example contains five separate queries that show how to query and group files, depending on their file size in bytes. You can easily modify these examples to base the query on some other property of the FileInfo object.

Module QueryBySize
    Sub Main()

        ' Change the drive\path if necessary 
        Dim root As String = "C:\Program Files\Microsoft Visual Studio 9.0" 

        'Take a snapshot of the folder contents 
        Dim dir As New System.IO.DirectoryInfo(root)
        Dim fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories)

        ' Return the size of the largest file 
        Dim maxSize = Aggregate aFile In fileList Into Max(GetFileLength(aFile))

        'Dim maxSize = fileLengths.Max
        Console.WriteLine("The length of the largest file under {0} is {1}", _
                          root, maxSize)

        ' Return the FileInfo object of the largest file 
        ' by sorting and selecting from the beginning of the list 
        Dim filesByLengDesc = From file In fileList _
                              Let filelength = GetFileLength(file) _
                              Where filelength > 0 _
                              Order By filelength Descending _
                              Select file

        Dim longestFile = filesByLengDesc.First

        Console.WriteLine("The largest file under {0} is {1} with a length of {2} bytes", _
                          root, longestFile.FullName, longestFile.Length)

        Dim smallestFile = filesByLengDesc.Last

        Console.WriteLine("The smallest file under {0} is {1} with a length of {2} bytes", _
                                root, smallestFile.FullName, smallestFile.Length)

        ' Return the FileInfos for the 10 largest files 
        ' Based on a previous query, but nothing is executed 
        ' until the For Each statement below. 
        Dim tenLargest = From file In filesByLengDesc Take 10

        Console.WriteLine("The 10 largest files under {0} are:", root)

        For Each fi As System.IO.FileInfo In tenLargest
            Console.WriteLine("{0}: {1} bytes", fi.FullName, fi.Length)
        Next 

        ' Group files according to their size, 
        ' leaving out the ones under 200K 
        Dim sizeGroups = From file As System.IO.FileInfo In fileList _
                         Where file.Length > 0 _
                         Let groupLength = file.Length / 100000 _
                         Group file By groupLength Into fileGroup = Group _
                         Where groupLength >= 2 _
                         Order By groupLength Descending

        For Each group In sizeGroups
            Console.WriteLine(group.groupLength + "00000")

            For Each item As System.IO.FileInfo In group.fileGroup
                Console.WriteLine("   {0}: {1}", item.Name, item.Length)
            Next 
        Next 

        ' Keep the console window open in debug mode
        Console.WriteLine("Press any key to exit.")
        Console.ReadKey()

    End Sub 

    ' This method is used to catch the possible exception 
    ' that can be raised when accessing the FileInfo.Length property. 
    ' In this particular case, it is safe to ignore the exception. 
    Function GetFileLength(ByVal fi As System.IO.FileInfo) As Long 
        Dim retval As Long 
        Try
            retval = fi.Length
        Catch ex As FileNotFoundException
            ' If a file is no longer present, 
            ' just return zero bytes. 
            retval = 0
        End Try 

        Return retval
    End Function 
End Module
class QueryBySize
{
    static void Main(string[] args)
    {
        QueryFilesBySize();
        Console.WriteLine("Press any key to exit");
        Console.ReadKey();
    }

    private static void QueryFilesBySize()
    {
        string startFolder = @"c:\program files\Microsoft Visual Studio 9.0\";

        // Take a snapshot of the file system.
        System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);

        // This method assumes that the application has discovery permissions 
        // for all folders under the specified path.
        IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);

        //Return the size of the largest file 
        long maxSize =
            (from file in fileList
             let len = GetFileLength(file)
             select len)
             .Max();

        Console.WriteLine("The length of the largest file under {0} is {1}",
            startFolder, maxSize);

        // Return the FileInfo object for the largest file 
        // by sorting and selecting from beginning of list
        System.IO.FileInfo longestFile =
            (from file in fileList
             let len = GetFileLength(file)
             where len > 0
             orderby len descending 
             select file)
            .First();

        Console.WriteLine("The largest file under {0} is {1} with a length of {2} bytes",
                            startFolder, longestFile.FullName, longestFile.Length);

        //Return the FileInfo of the smallest file
        System.IO.FileInfo smallestFile =
            (from file in fileList
             let len = GetFileLength(file)
             where len > 0
             orderby len ascending 
             select file).First();

        Console.WriteLine("The smallest file under {0} is {1} with a length of {2} bytes",
                            startFolder, smallestFile.FullName, smallestFile.Length);

        //Return the FileInfos for the 10 largest files 
        // queryTenLargest is an IEnumerable<System.IO.FileInfo> 
        var queryTenLargest =
            (from file in fileList
             let len = GetFileLength(file)
             orderby len descending 
             select file).Take(10);

        Console.WriteLine("The 10 largest files under {0} are:", startFolder);

        foreach (var v in queryTenLargest)
        {
            Console.WriteLine("{0}: {1} bytes", v.FullName, v.Length);
        }


        // Group the files according to their size, leaving out 
        // files that are less than 200000 bytes.  
        var querySizeGroups =
            from file in fileList
            let len = GetFileLength(file)
            where len > 0
            group file by (len / 100000) into fileGroup
            where fileGroup.Key >= 2
            orderby fileGroup.Key descending 
            select fileGroup;


        foreach (var filegroup in querySizeGroups)
        {
            Console.WriteLine(filegroup.Key.ToString() + "00000");
            foreach (var item in filegroup)
            {
                Console.WriteLine("\t{0}: {1}", item.Name, item.Length);
            }
        }
    }

    // This method is used to swallow the possible exception 
    // that can be raised when accessing the FileInfo.Length property. 
    // In this particular case, it is safe to swallow the exception. 
    static long GetFileLength(System.IO.FileInfo fi)
    {
        long retval;
        try
        {
            retval = fi.Length;
        }
        catch (System.IO.FileNotFoundException)
        {
            // If a file is no longer present, 
            // just add zero bytes to the total.
            retval = 0;
        }
        return retval;
    }

}

To return one or more complete FileInfo objects, the query first must examine each one in the data source, and then sort them by the value of their Length property. Then it can return the single one or the sequence with the greatest lengths. Use First to return the first element in a list. Use Take<TSource> to return the first n number of elements. Specify a descending sort order to put the smallest elements at the start of the list.

The query calls out to a separate method to obtain the file size in bytes in order to consume the possible exception that will be raised in the case where a file was deleted on another thread in the time period since the FileInfo object was created in the call to GetFiles. Even through the FileInfo object has already been created, the exception can occur because a FileInfo object will try to refresh its Length property by using the most current size in bytes the first time the property is accessed. By putting this operation in a try-catch block outside the query, we follow the rule of avoiding operations in queries that can cause side-effects. In general, great care must be taken when consuming exceptions, to make sure that an application is not left in an unknown state.

Compiling the Code

  • Create a Visual Studio project that targets the .NET Framework version 3.5. The project has a reference to System.Core.dll and a using directive (C#) or Imports statement (Visual Basic) for the System.Linq namespace by default.

  • Copy this code into your project.

  • Press F5 to compile and run the program.

  • Press any key to exit the console window.

Robust Programming

For intensive query operations over the contents of multiple types of documents and files, consider using the Windows Desktop Search engine.

See Also

Concepts

LINQ to Objects

LINQ and File Directories