March 2010

Volume 25 Number 03

Data Points - Precompiling LINQ Queries

By Julie Lerman | March 2010

When using LINQ to SQL or LINQ to Entities in your applications, it’s important to consider precompiling any query you create and execute repeatedly. I often get caught up in completing a particular task and neglect to leverage precompiled queries until I’m fairly far along in the development process. This is much like “exception handling disease,” where devs try to shoehorn exception handling into an app after the fact.

However, even after you’ve implemented this important performance-enhancing technique, there’s a good chance you’re losing its benefit. You may notice that the promised performance gain isn’t being realized, but the reason (and fix) might escape you.

In this column, I’ll first explain how to precompile queries and then focus on the problem of losing the precompilation benefits in Web applications, services and other scenarios. You’ll learn how to ensure that you’re getting the performance benefit across postbacks, short-lived service operations and other code where critical instances are going out of scope.

Precompiling Queries

The process of transforming your LINQ query into the relevant store query (for example, the T-SQL executed by the database) is relatively expensive within the larger umbrella of query execution. Figure 1 shows the involved process that transforms a LINQ to Entities query into a store query.


Figure 1 Transforming a LINQ Query into a Relevant Store Query

The Entity Framework team’s blog post, “Exploring the Performance of the ADO.NET Entity Framework - Part 1” (blogs.msdn.com/adonet/archive/2008/02/04/exploring-the-performance-of-the-ado-net-entity-framework-part-1.aspx), breaks down the process and shows the relative time taken by each step. Note that this post was based on the Microsoft .NET Framework 3.5 SP1 version of Entity Framework, and the time-per-step distribution has likely shifted in the new version. Nevertheless, precompilation is still an expensive part of the query execution process.

By precompiling your query, Entity Framework and LINQ to SQL can reuse the store query and skip the redundant process of figuring it out each time. For example, if your app frequently retrieves different customers from the data store, you might have a query such as this:

Context.Customers.Where(c=>c.CustomerID==_custID)

When nothing but the _custID parameter changes from one execution to the next, why waste the effort of transposing this to a SQL command over and over again?

LINQ to SQL and Entity Framework both enable query precompilation; however, because of some differences between the processes in the two frameworks, they each have their own CompiledQuery class. LINQ to SQL uses System.Data.LINQ.CompiledQuery while Entity Framework uses System.Data. Objects.CompiledQuery. Both forms of CompiledQuery allow you to pass in parameters, and both require that you pass in the current DataContext or ObjectContext being used. Essentially, from the coding perspective, they’re equal.

The CompiledQuery.Compile method returns a delegate in the form of a Func that can, in turn, be invoked on demand.

Here is a simple query being compiled by Entity Framework’s CompiledQuery class, which is static and therefore doesn’t require instantiation:

C#

var _custByID = CompiledQuery.Compile<SalesEntities, int, Customer>
   ((ctx, id) =>ctx.Customers.Where(c=> c.ContactID == id).Single());

VB

Dim _custByID= CompiledQuery.Compile(Of SalesEntities, Integer, Customer) 
   (Function(ctx As ObjectContext, id As Integer) 
    ctx.Customers.Where(Function(c) c.CustomerID = custID).Single)

You can use either LINQ methods or LINQ operators in the query expression. These queries are built with LINQ methods and lambdas.

The syntax is a little more confusing than your typical generic method, so I’ll break it down. Again, the goal of the Compile method is to create a Func (delegate) that can be invoked at a later time, as shown here:

C#

CompiledQuery.Compile<SalesEntities, int, Customer>

VB

CompiledQuery.Compile(Of SalesEntities, Integer, Customer)

Because it’s generic, the method must be told what types are being passed in as arguments, as well as what type will be returned when the delegate is invoked. Minimally, you must pass in some type of ObjectContext,  or DataContext for LINQ to SQL. You can specify a System.Data.Objects.ObjectContext or something that derives from it. In my case, I’m explicitly using the derived class SalesEntities that is associated with my Entity Data Model.

You can also define multiple arguments, which must come directly after the context. In my example, I’m telling Compile that the resulting precompiled query should also take an int/Integer parameter. The last type describes what the query will be returning, in my case a Customer object:

C#

((ctx, id) =>ctx.Customers.Where(c => c.ContactID == id).Single())

VB

Function(ctx As ObjectContext, id As Integer) 
   ctx.Customers.Where(Function(c) c.CustomerID = custID).Single

The result of the previous compile method is the following delegate:

C#

private System.Func<SalesEntities, int, Customer> _custByID

Private _custByID As System.Func(Of SalesEntities, Integer, Customer)

Once the query has been compiled, you simply invoke it whenever you want to execute that query, passing in the ObjectContext or DataContext instance and any other required parameters. Here I have an instance named _commonContext and a variable named _custID:

Customer cust  = _custByID.Invoke(_commonContext, _custID);

The first time the delegate is invoked, the query is translated to the store query and that translation is cached for reuse on subsequent calls to Invoke. LINQ can skip the task of compiling the query and go right to execution.

Ensuring that Your Precompiled Query Is Truly Being Used

There’s a not-so-obvious, and not widely known, problem with precompiled queries. Many developers presume that the query is cached in the application process and will stick around. I certainly made this assumption, because there was nothing I found to indicate otherwise—except for some unimpressive performance numbers. However, when the object where you instantiated the compiled query goes out of scope, you also lose the precompiled query. It will need to be precompiled again for each use, so you completely lose the benefit of the precompiling. In fact, you’re paying a higher price than you would if you were simply executing a LINQ query, due to some of the extra effort the CLR must make with respect to the delegate.

Rico Mariani digs into the cost of using the delegate in his blog post, “Performance Quiz #13—Linq to SQL compiled query cost—solution” (blogs.msdn.com/ricom/archive/2008/01/14/performance-quiz-13-linq-to-sql-compiled-query-cost-solution.aspx). The discussion in the comments is equally enlightening.

I’ve seen blog reports about LINQ to Entities’ “terrible performance” in Web apps “even with precompiled queries.” The reason is that every time a page posts back, you’re getting a newly instantiated context and re-precompiling the query. The precompiled query is never getting reused. You’ll have the same problem anywhere you have short-lived contexts. This could happen in an obvious place, such as a Web or Windows Communication Foundation (WCF) service, or even in something less obvious, such as a repository that will instantiate a new context on the fly if an instance hasn’t been provided.

You can avoid the loss of the delegate by using a static (Shared, in VB) variable to retain the query across processes, and then invoking it using whatever context is currently available.

Here’s a pattern I’ve successfully used with Web applications, WCF services and repositories, where the ObjectContext goes out of scope frequently and I want the delegate to be available throughout the application process. You need to declare a static delegate in the constructor of the class where you’ll be invoking queries. Here I’m declaring a delegate that matches the compiled query I previously created:

C#

static Func<ObjectContext, int, Customer> _custByID;

VB

Shared _custByID As Func(Of ObjectContext, Integer, Customer)

There are a few possible places to compile the query. You can do it in a class constructor or just prior to invoking it. Here is a method that is designed to perform a query and return a Customer object:

public static Customer GetCustomer( int ID)
    {
      //test for an existing instance of the compiled query
      if (_custByID == null)
      {
        _custByID = CompiledQuery.Compile<SalesEntities, int, Customer>
         ((ctx, id) => ctx.Customers.Where(c => c.CustomerID == id).Single());
      }
      return _custByID.Invoke(_context, ID);
    }

The method will use the compiled query. First it will compile the query on the fly, but only if necessary, which I determine by testing to see if the query has been instantiated yet. If you are compiling in your class constructor, you’ll need to perform the same test to be sure you’re only using resources to compile when necessary.

Because the delegate, _custByID, is static, it will remain in memory even when its containing class goes out of scope. Therefore, as long as the application process itself is in scope, the delegate will be available; it won’t be null, and the compilation step will be skipped.

Precompiled Queries and Projections

There are some other speed bumps to be aware of that are much more discoverable. The first revolves around projections, but isn’t specific to the 
problem of unwittingly recompiling your precompiled query. When you project columns in a query, rather than returning specific types, you’ll always get an anonymous type as a result.

When defining the query, specifying its return type is impossible because there’s no way to say “type of anonymous type.” You’ll have the same problem if you want to have the query inside of a method that returns the results, because you can’t specify what will be returned by the method. Developers using LINQ hit this latter limitation frequently.

If you focus on the fact that an anonymous type is an on-the-fly type that isn’t meant to be reused, these limitations, while frustrating, make sense. Anonymous types aren’t meant to be passed around from method to method.

What you‘ll need to do for your precompiled query is define a type that matches the projection. Note that in Entity Framework you must use a class, not a struct, as LINQ to Entities won’t allow you to project into a type that doesn’t have a constructor. LINQ to SQL does allow structs to be the target of a projection. So, for Entity Framework you can only use a class, but for LINQ to SQL you can use either a class or a struct to avoid the limitations surrounding anonymous types.

Precompiled Queries and LINQ to SQL Prefetching

Another potential problem with precompiled queries involves prefetching, or eager loading, but the problem only arises with LINQ to SQL. In Entity Framework, you use the Include method to eager load, which results in a single query being executed in the database. Because Include can be part of a query, such as context.Customer.Include(“Orders”),  it’s not an issue here. However, with LINQ to SQL, the eager loading is defined within the DataContext, not the query itself.

DataContext.LoadOptions has a LoadWith method that lets you specify what related data should get eager loaded along with a particular entity.

You can define LoadWith to load Orders with any Customer that is queried:

Context.LoadOptions.LoadWith<Customer>(c => c.Orders)

Then you can add a rule that says to load details with any orders that are loaded:

Context.LoadOptions.LoadWith<Customer>(c => c.Orders)
Context.LoadOptions.LoadWith<Order>(o =>o.Details)

You can define the LoadOptions directly against your DataContext instance or create a DataLoadOptions class, define LoadWith rules in this object, and then attach it to your context:

DataLoadOptions _myLoadOptions = new DataLoadOptions();
_myLoadOptions.LoadWith<Customer>(c => c.Orders)
Context.LoadOptions= myLoadOptions

There are caveats with the general use of LoadOptions and the DataLoadOptions class. For example, if you define and then attach DataLoadOptions, once a query has been executed against the DataContext, you can’t attach a new set of DataLoadOptions. There’s a lot more you can learn about the various load options and their caveats, but let’s take a look at a basic pattern for applying some LoadOptions to a precompiled query.

The key to the pattern is that you can predefine the DataLoadOptions without associating them with a particular context.

In the class declarations where you declare the static Func variables to contain the precompiled queries, declare a new DataLoadOptions variable. It’s critical to make this variable static so it, too, remains available along with the delegates:

static DataLoadOptions Load_Customer_Orders_Details = new DataLoadOptions();

Then in the method that compiles and invokes the query, you can define the LoadOptions along with the delegate (see Figure 2). This method is valid in the .NET Framework 3.5 and .NET Framework 4.

Figure 2 Defining LoadOptions Along with a Delegate

public Customer GetRandomCustomerWithOrders()
    {
      if (Load_Customer_Orders_Details == null)
      {
        Load_Customer_Orders_Details = new DataLoadOptions();
        Load_Customer_Orders_Details.LoadWith<Customer>(c => c.Orders);
        Load_Customer_Orders_Details.LoadWith<Order>(o => o.Details);
      }
      if (_context.LoadOptions == null)
      {
        _context.LoadOptions = Load_Customer_Orders_Details;
      }
      if (_CustWithOrders == null)
      {
        _CustWithOrders = CompiledQuery.Compile<DataClasses1DataContext, Customer>
               (ctx => ctx.Customers.Where(c => c.Orders.Any()).FirstOrDefault());
      }
      return _CustWithOrders.Invoke(_context);
    }

Because the DataLoadOptions are static, they’re defined only when necessary. Based on the logic of your class, the DataContext may or may not be new. If it’s a context that’s being reused, then it will have the previously assigned LoadOptions. Otherwise you’ll need to assign them. Now you’ll be able to invoke this query repeatedly and still get the benefit of LINQ to SQL’s prefetching capabilities.

Keep Precompiling at the Top of Your Checklist

In the scope of LINQ query execution, query compilation is an expensive part of the process. Any time you’re adding LINQ query logic to your LINQ to SQL- or Entity Framework-based applications, you should consider precompiling the queries and reusing them. But don’t assume you’re finished there. As you’ve seen, there are scenarios where you may not be benefiting from the precompiled query. Use some type of profiler, such as SQL Profiler or one of the profiling tools from Hibernating Rhinos, which include L2SProf (l2sprof.com/) and EFProf (efprof.com). You may need to leverage some of the patterns shown here to ensure that you’re getting the edge that precompiled queries promise.

Danny Simmons, from the Microsoft Entity Framework team, explains how to control merge options when precompiling queries—and lists some additional gotchas to watch out for—in his blog post at blogs.msdn.com/dsimmons/archive/2010/01/12/ef-merge-options-and-compiled-queries.aspx.


 

Julie Lerman  is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other Microsoft .NET topics at user groups and conferences around the world. Lerman blogs at thedatafarm.com/blog and is the author of the highly acclaimed book, “Programming Entity Framework” (O’Reilly Media, 2009). You can follow her on Twitter at Twitter.com/julielermanvt.

Thanks to the following technical expert for reviewing this article:  Danny Simmons