How to: Group Files by Extension (LINQ)

This example shows how LINQ can be used to perform advanced grouping and sorting operations on lists of files or folders. It also shows how to page output in the console window by using the Skip<TSource> and Take<TSource> methods.

Example

The following query shows how to group the contents of a specified directory tree by the file name extension.

Module GroupByExtension
    Public Sub Main()

        ' Root folder to query, along with all subfolders. 
        Dim startFolder As String = "C:\program files\Microsoft Visual Studio 9.0\VB\" 

        ' Used in WriteLine() to skip over startfolder in output lines. 
        Dim rootLength As Integer = startFolder.Length

        'Take a snapshot of the folder contents 
        Dim dir As New System.IO.DirectoryInfo(startFolder)
        Dim fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories)

        ' Create the query. 
        Dim queryGroupByExt = From file In fileList _
                          Group By file.Extension.ToLower() Into fileGroup = Group _
                          Order By ToLower _
                          Select fileGroup

        ' Execute the query. By storing the result we can 
        ' page the display with good performance. 
        Dim groupByExtList = queryGroupByExt.ToList()

        ' Display one group at a time. If the number of  
        ' entries is greater than the number of lines 
        ' in the console window, then page the output. 
        Dim trimLength = startFolder.Length
        PageOutput(groupByExtList, trimLength)

    End Sub 

    ' Pages console diplay for large query results. No more than one group per page. 
    ' This sub specifically works with group queries of FileInfo objects 
    ' but can be modified for any type. 
    Sub PageOutput(ByVal groupQuery, ByVal charsToSkip)

        ' "3" = 1 line for extension key + 1 for "Press any key" + 1 for input cursor. 
        Dim numLines As Integer = Console.WindowHeight - 3
        ' Flag to indicate whether there are more results to diplay 
        Dim goAgain As Boolean = True 

        For Each fg As IEnumerable(Of System.IO.FileInfo) In groupQuery
            ' Start a new extension at the top of a page. 
            Dim currentLine As Integer = 0

            Do While (currentLine < fg.Count())
                Console.Clear()
                Console.WriteLine(fg(0).Extension)

                ' Get the next page of results 
                ' No more than one filename per page 
                Dim resultPage = From file In fg _
                                Skip currentLine Take numLines

                ' Execute the query. Trim the display output. 
                For Each line In resultPage
                    Console.WriteLine(vbTab & line.FullName.Substring(charsToSkip))
                Next 

                ' Advance the current position
                currentLine = numLines + currentLine

                ' Give the user a chance to break out of the loop
                Console.WriteLine("Press any key for next page or the 'End' key to exit.")
                Dim key As ConsoleKey = Console.ReadKey().Key
                If key = ConsoleKey.End Then
                    goAgain = False 
                    Exit For 
                End If 
            Loop 
        Next 
    End Sub 
End Module
class GroupByExtension
{
    // This query will sort all the files under the specified folder 
    //  and subfolder into groups keyed by the file extension. 
    private static void Main()
    {
        // Take a snapshot of the file system. 
        string startFolder = @"c:\program files\Microsoft Visual Studio 9.0\Common7";

        // Used in WriteLine to trim output lines. 
        int trimLength = startFolder.Length;

        // Take a snapshot of the file system.
        System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);

        // This method assumes that the application has discovery permissions 
        // for all folders under the specified path.
        IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);

        // Create the query. 
        var queryGroupByExt =
            from file in fileList
            group file by file.Extension.ToLower() into fileGroup
            orderby fileGroup.Key
            select fileGroup;

        // Display one group at a time. If the number of  
        // entries is greater than the number of lines 
        // in the console window, then page the output.
        PageOutput(trimLength, queryGroupByExt);
    }

    // This method specifically handles group queries of FileInfo objects with string keys. 
    // It can be modified to work for any long listings of data. Note that explicit typing 
    // must be used in method signatures. The groupbyExtList parameter is a query that produces 
    // groups of FileInfo objects with string keys. 
    private static void PageOutput(int rootLength,
                                    IEnumerable<System.Linq.IGrouping<string, System.IO.FileInfo>> groupByExtList)
    {
        // Flag to break out of paging loop. 
        bool goAgain = true;

        // "3" = 1 line for extension + 1 for "Press any key" + 1 for input cursor.
        int numLines = Console.WindowHeight - 3;

        // Iterate through the outer collection of groups. 
        foreach (var filegroup in groupByExtList)
        {
            // Start a new extension at the top of a page. 
            int currentLine = 0;

            // Output only as many lines of the current group as will fit in the window. 
            do
            {
                Console.Clear();
                Console.WriteLine(filegroup.Key == String.Empty ? "[none]" : filegroup.Key);

                // Get 'numLines' number of items starting at number 'currentLine'. 
                var resultPage = filegroup.Skip(currentLine).Take(numLines);

                //Execute the resultPage query 
                foreach (var f in resultPage)
                {
                    Console.WriteLine("\t{0}", f.FullName.Substring(rootLength));
                }

                // Increment the line counter.
                currentLine += numLines;

                // Give the user a chance to escape.
                Console.WriteLine("Press any key to continue or the 'End' key to break...");
                ConsoleKey key = Console.ReadKey().Key;
                if (key == ConsoleKey.End)
                {
                    goAgain = false;
                    break;
                }
            } while (currentLine < filegroup.Count());

            if (goAgain == false)
                break;
        }
    }
}

The output from this program can be long, depending on the details of the local file system and what the startFolder is set to. To enable viewing of all results, this example shows how to page through results. The same techniques can be applied to Windows and Web applications. Notice that because the code pages the items in a group, a nested foreach loop is required. There is also some additional logic to compute the current position in the list, and to enable the user to stop paging and exit the program. In this particular case, the paging query is run against the cached results from the original query. In other contexts, such as LINQ to SQL, such caching is not required.

Compiling the Code

  • Create a Visual Studio project that targets the .NET Framework version 3.5. By default, the project has a reference to System.Core.dll and a using directive (C#) or Imports statement (Visual Basic) for the System.Linq namespace. In C# projects, add a using directive for the System.IO namespace.

  • Copy this code into your project.

  • Press F5 to compile and run the program.

  • Press any key to exit the console window.

Robust Programming

For intensive query operations over the contents of multiple types of documents and files, consider using the Windows Desktop Search engine.

See Also

Concepts

LINQ to Objects

LINQ and File Directories