Export (0) Print
Expand All

How to: Perform Custom Join Operations (C# Programming Guide)

This example shows how to perform join operations that are not possible with the join clause. In a query expression, the join clause is limited to, and optimized for, equijoins, which are by far the most common type of join operation. When performing an equijoin, you will probably always get the best performance by using the join clause.

However, there some cases where the join clause cannot be used:

  • When the join is predicated on an expression of inequality (a non-equijoin).

  • When the join is predicated on more than one expression of equality or inequality.

  • When you have to introduce a temporary range variable for the right side (inner) sequence before the join operation.

To perform joins that are not equijoins, you can use multiple from clauses to introduce each data source independently. You then apply a predicate expression in a where clause to the range variable for each source. The expression can also have the form of a method call.

NoteNote:

Do not confuse this kind of custom join operation with the use of multiple from clauses to access inner collections. For more information, see join clause (C# Reference).

The first method in the following example shows a simple cross join. Cross joins must be used with caution because they can produce very large result sets. However, they can be useful in some scenarios for creating source sequences against which additional queries are run.

The second method produces a sequence of all the products whose category ID is listed in the category list on the left side. Note the use of the let clause and the Contains method to create a temporary array. It would also be possible to create the array before the query and eliminate the first from clause.

     class CustomJoins
     {

         #region Data

         class Product
         {
             public string Name { get; set; }
             public int CategoryID { get; set; }
         }

         class Category
         {
             public string Name { get; set; }
             public int ID { get; set; }
         }

         // Specify the first data source.
         List<Category> categories = new List<Category>()
 { 
     new Category(){Name="Beverages", ID=001},
     new Category(){ Name="Condiments", ID=002},
     new Category(){ Name="Vegetables", ID=003},         
 };

         // Specify the second data source.
         List<Product> products = new List<Product>()
{
   new Product{Name="Tea",  CategoryID=001},
   new Product{Name="Mustard", CategoryID=002},
   new Product{Name="Pickles", CategoryID=002},
   new Product{Name="Carrots", CategoryID=003},
   new Product{Name="Bok Choy", CategoryID=003},
   new Product{Name="Peaches", CategoryID=005},
   new Product{Name="Melons", CategoryID=005},
   new Product{Name="Ice Cream", CategoryID=007},
   new Product{Name="Mackerel", CategoryID=012},
 };
         #endregion

         static void Main()
         {
             CustomJoins app = new CustomJoins();
             app.CrossJoin();
             app.NonEquijoin();

             Console.WriteLine("Press any key to exit.");
             Console.ReadKey();
         }

         void CrossJoin()
         {
             var crossJoinQuery =
                 from c in categories
                 from p in products
                 select new { c.ID, p.Name };

             Console.WriteLine("Cross Join Query:");
             foreach (var v in crossJoinQuery)
             {
                 Console.WriteLine("{0,-5}{1}", v.ID, v.Name);
             }
         }

         void NonEquijoin()
         {
             var nonEquijoinQuery =
                 from p in products
                 let catIds = from c in categories
                              select c.ID
                 where catIds.Contains(p.CategoryID) == true 
                 select new { Product = p.Name, CategoryID = p.CategoryID };

             Console.WriteLine("Non-equijoin query:");
             foreach (var v in nonEquijoinQuery)
             {
                 Console.WriteLine("{0,-5}{1}", v.CategoryID, v.Product);
             }
         }
     }
     /* Output:
 Cross Join Query:
 1    Tea
 1    Mustard
 1    Pickles
 1    Carrots
 1    Bok Choy
 1    Peaches
 1    Melons
 1    Ice Cream
 1    Mackerel
 2    Tea
 2    Mustard
 2    Pickles
 2    Carrots
 2    Bok Choy
 2    Peaches
 2    Melons
 2    Ice Cream
 2    Mackerel
 3    Tea
 3    Mustard
 3    Pickles
 3    Carrots
 3    Bok Choy
 3    Peaches
 3    Melons
 3    Ice Cream
 3    Mackerel
 Non-equijoin query:
 1    Tea
 2    Mustard
 2    Pickles
 3    Carrots
 3    Bok Choy
 Press any key to exit.
      */

In the following example, the query must join two sequences based on matching keys that, in the case of the inner (right side) sequence cannot be obtained prior to the join clause itself. If this join were performed with a join clause, then the Split method will have to be called for each element. The use of multiple from clauses enables the query to avoid the overhead of the repeated method call. However, since join is optimized, in this particular case it might still be faster than using multiple from clauses. The results will vary depending primarily on how expensive the method call is.

class MergeTwoCSVFiles : StudentClass
{
    static void Main()
    {
        string[] names = System.IO.File.ReadAllLines(@"../../../names.csv");
        string[] scores = System.IO.File.ReadAllLines(@"../../../scores.csv");

        // Merge the data sources using a named type 
        // var could be used instead of an explicit type
        IEnumerable<Student> queryNamesScores =
            from name in names
            let x = name.Split(',')
            from score in scores
            let s = score.Split(',')
            where x[2] == s[0]
            select new Student()
            {
                FirstName = x[0],
                LastName = x[1],
                ID = Convert.ToInt32(x[2]),
                ExamScores = (from scoreAsText in s.Skip(1)
                              select Convert.ToInt32(scoreAsText)).
                              ToList()
            };

        // Optional. Store the newly created student objects in memory 
        // for faster access in future queries
        List<Student> students = queryNamesScores.ToList();

        foreach (var student in students)
        {
            Console.WriteLine("The average score of {0} {1} is {2}.",
                student.FirstName, student.LastName, student.ExamScores.Average());
        }

        //Keep console window open in debug mode
        Console.WriteLine("Press any key to exit.");
        Console.ReadKey();
    }
}
/* Output: 
    The average score of Adams Terry is 85.25.
    The average score of Fakhouri Fadi is 92.25.
    The average score of Feng Hanying is 88.
    The average score of Garcia Cesar is 88.25.
    The average score of Garcia Debra is 67.
    The average score of Garcia Hugo is 85.75.
    The average score of Mortensen Sven is 84.5.
    The average score of O'Donnell Claire is 72.25.
    The average score of Omelchenko Svetlana is 82.5.
    The average score of Tucker Lance is 81.75.
    The average score of Tucker Michael is 92.
    The average score of Zabokritski Eugene is 83.
 */

This example is also found in the article How to: Populate Object Collections from Multiple Sources (LINQ). To compile and run this example, follow the instructions in that article.

  • Create a Visual Studio project that targets the .NET Framework version 3.5. By default, the project has a reference to System.Core.dll and a using directive for the System.Linq namespace.

  • Copy the code into your project.

  • Press F5 to compile and run the program.

  • Press any key to exit the console window.

Community Additions

ADD
Show:
© 2014 Microsoft