How to: Split a File Into Many Files by Using Groups (LINQ)

This example shows one way to merge the contents of two files and then create a set of new files that organize the data in a new way.

To create the data files

  1. Copy these names into a text file that is named names1.txt and save it in your solution folder:

    Bankov, Peter
    Holm, Michael
    Garcia, Hugo
    Potra, Cristina
    Noriega, Fabricio
    Aw, Kam Foo
    Beebe, Ann
    Toyoshima, Tim
    Guy, Wey Yuan
    Garcia, Debra
    
  2. Copy these names into a text file that is named names2.txt and save it in your solution folder: Note that the two files have some names in common.

    Liu, Jinghao
    Bankov, Peter
    Holm, Michael
    Garcia, Hugo
    Beebe, Ann
    Gilchrist, Beth
    Myrcha, Jacek
    Giakoumakis, Leo
    McLin, Nkenge
    El Yassir, Mehdi
    

Example

Class SplitWithGroups

    Shared Sub Main()

        Dim fileA As String() = System.IO.File.ReadAllLines("../../../names1.txt")
        Dim fileB As String() = System.IO.File.ReadAllLines("../../../names2.txt")

        ' Concatenate and remove duplicate names based on 
        Dim mergeQuery As IEnumerable(Of String) = fileA.Union(fileB)

        ' Group the names by the first letter in the last name 
        Dim groupQuery = From name In mergeQuery 
                     Let n = name.Split(New Char() {","}) 
                     Order By n(0) 
                     Group By groupKey = n(0)(0) 
                     Into groupName = Group

        ' Create a new file for each group that was created 
        ' Note that nested foreach loops are required to access 
        ' individual items with each group. 
        For Each gGroup In groupQuery
            Dim fileName As String = "..'..'..'testFile_" & gGroup.groupKey & ".txt" 
            Dim sw As New System.IO.StreamWriter(fileName)
            Console.WriteLine(gGroup.groupKey)
            For Each item In gGroup.groupName
                Console.WriteLine("   " & item.name)
                sw.WriteLine(item.name)
            Next
            sw.Close()
        Next 

        ' Keep console window open in debug mode.
        Console.WriteLine("Files have been written. Press any key to exit.")
        Console.ReadKey()

    End Sub 
End Class 
' Console Output: 
' A 
'    Aw, Kam Foo 
' B 
'    Bankov, Peter 
'    Beebe, Ann 
' E 
'    El Yassir, Mehdi 
' G 
'    Garcia, Hugo 
'    Garcia, Debra 
'    Giakoumakis, Leo 
'    Gilchrist, Beth 
'    Guy, Wey Yuan 
' H 
'    Holm, Michael 
' L 
'    Liu, Jinghao 
' M 
'    McLin, Nkenge 
'    Myrcha, Jacek 
' N 
'    Noriega, Fabricio 
' P 
'    Potra, Cristina 
' T 
'    Toyoshima, Tim
class SplitWithGroups
{
    static void Main()
    {
        string[] fileA = System.IO.File.ReadAllLines(@"../../../names1.txt");
        string[] fileB = System.IO.File.ReadAllLines(@"../../../names2.txt");

        // Concatenate and remove duplicate names based on 
        // default string comparer 
        var mergeQuery = fileA.Union(fileB);

        // Group the names by the first letter in the last name. 
        var groupQuery = from name in mergeQuery
                         let n = name.Split(',')
                         group name by n[0][0] into g
                         orderby g.Key
                         select g;

        // Create a new file for each group that was created 
        // Note that nested foreach loops are required to access 
        // individual items with each group. 
        foreach (var g in groupQuery)
        {
            // Create the new file name. 
            string fileName = @"../../../testFile_" + g.Key + ".txt";

            // Output to display.
            Console.WriteLine(g.Key);

            // Write file. 
            using (System.IO.StreamWriter sw = new System.IO.StreamWriter(fileName))
            {
                foreach (var item in g)
                {
                    sw.WriteLine(item);
                    // Output to console for example purposes.
                    Console.WriteLine("   {0}", item);
                }
            }
        }
        // Keep console window open in debug mode.
        Console.WriteLine("Files have been written. Press any key to exit");
        Console.ReadKey();
    }
}
/* Output: 
    A
       Aw, Kam Foo
    B
       Bankov, Peter
       Beebe, Ann
    E
       El Yassir, Mehdi
    G
       Garcia, Hugo
       Guy, Wey Yuan
       Garcia, Debra
       Gilchrist, Beth
       Giakoumakis, Leo
    H
       Holm, Michael
    L
       Liu, Jinghao
    M
       Myrcha, Jacek
       McLin, Nkenge
    N
       Noriega, Fabricio
    P
       Potra, Cristina
    T
       Toyoshima, Tim
 */

The program writes a separate file for each group in the same folder as the data files.

Compiling the Code

  • Create a Visual Studio project that targets the .NET Framework version 3.5. By default, the project has a reference to System.Core.dll and a using directive (C#) or Imports statement (Visual Basic) for the System.Linq namespace. In C# projects, add a using directive for the System.IO namespace.

  • Copy this code into your project.

  • Press F5 to compile and run the program.

  • Press any key to exit the console window.

See Also

Concepts

LINQ and Strings

LINQ and File Directories