Introducing
LINQ to Relational Data
With the combined launch of Visual Studio 2008, SQL Server 2008,
and Windows Server 2008, Microsoft is introducing five implementations of .NET
Language Integrated Query (LINQ).
Of these five implementations, two specifically target access to
relational databases: LINQ to SQL and LINQ to Entities. This white paper
introduces these two technologies and the scenarios in which each can best be
used.
Microsoft Language Integrated Query (LINQ) offers developers a
new way to query data using strongly-typed queries and strongly-typed results,
common across a number of disparate data types including relational databases,
.NET objects, and XML. By using strongly-typed queries and results, LINQ
improves developer productivity with the benefits of IntelliSense and compile-time
error checking.
LINQ to SQL, released with the Visual Studio 2008, is designed
to provide strongly-typed LINQ access for rapidly developed applications across
the Microsoft SQL Server family of databases.
LINQ to Entities, to be released in an update to Visual Studio
2008 in the first half of 2008, is designed to provide strongly-typed LINQ
access for applications requiring a more flexible Object Relational mapping,
across Microsoft SQL Server and third-party databases.
LINQ to SQL is an object-relational mapping (ORM) framework that
allows the direct 1-1 mapping of a Microsoft SQL Server database to .NET
classes, and query of the resulting objects using LINQ. More specifically, LINQ
to SQL has been developed to target a rapid development scenario against
Microsoft SQL Server where the database closely resembles the application
object model and the primary concern is increased developer productivity.
Figures 1 & 2 combined with the code snippet below
demonstrate a simple LINQ to SQL scenario. Figure 1 shows the LINQ to SQL
mapping, and Figure 2 shows the associated database diagram, using the
Northwind database.
.jpg)
Figure 1. Database diagram for a subset of the Northwind
database.
.jpg)
Figure 2. LINQ to SQL mapping diagram for a simple scenario
using a subset of the Northwind database and the associated database diagram.
Notice the use of an intermediary table to map the many-to-many relationship
between Employees and Territories.
This code snippet shows a simple LINQ query against the Northwind database to retrieve all customers whose address is in London.
NorthwindDataContext db = new NorthwindDataContext();
var customers = from c in db.Customers
where c.City == "London"
select c;
Listing 1. Simple LINQ to SQL query
LINQ to SQL has been architected with simplicity and developer
productivity in mind. APIs have been designed to “just work” for common
application scenarios. Examples of this design include the ability to replace
unfriendly database naming conventions with friendly names, to map SQL schema
objects directly to classes in the application, to implicitly load data that
has been requested but has not previously been loaded into memory, and the use
of common naming conventions and partial methods to provide custom business or
update logic.
The ability to replace unfriendly database naming conventions
with more developer friendly names provides benefit that is pretty self
explanatory, making it easier to understand the structure and meaning of data
as the application is developed. These changes can be done using the LINQ to
SQL designer in Visual Studio 2008, by double clicking on the entity object
name, as seen in Figure 3 below.
.jpg)
.jpg)
Figure 3. Changing the name of an entity object in LINQ to SQL.
As mentioned above, LINQ to SQL targets scenarios where the
database closely resembles the application object model that is desired. Given
this target scenario, the mapping between the SQL schema and classes in the
application is a direct 1-1 mapping, meaning that a table or view being
accessed from the database maps to a single class in the application, and a
column being accessed maps to a property on the associated class.
By default, LINQ to SQL enables deferred loading. This means that
if, for example, a query retrieves Customer data, it does not automatically
pull the associated Order information into memory. When the data is requested
however, LINQ to SQL uses its implicit loading capability to retrieve the data
that has been requested but has not previously been loaded into memory. In
Listing 1 we queried the database for all customers with an address based in
London. While iterating through the resulting list of customers we would like
to access the list of Orders that each customer has placed. Looking back to the
original query in Listing 1, we notice that we did not retrieve any information
about Orders; we simply retrieved information included in the Customer object.
In Listing 2, we continue from the previous query to iterate through the customers
that were returned and print out the total number Orders associated with each
customer.
foreach (Customer c in customers)
{
Console.WriteLine(c.CompanyName + " " +
c.Orders.Count);
}
Listing 2. Implicit loading of Orders data that has been
requested but not previously loaded into memory.
In the above example, a separate query is executed to retrieve
the Orders for each Customer. If it is known in advance that we need to
retrieve the orders for all customers, we can use LoadOptions to request that
the associated Orders be retrieved along with the Customers, in a single
request.
Beyond the simplicity of the query experience, LINQ to SQL offers
additional features to improve the developer experience with regards to
application logic. Partial methods, a new language feature of C# and VB in
Visual Studio 2008, allow one part of a partial class to define and call
methods which are invoked, if implemented in another part of the class. If the
method has not been implemented; the entire method call is optimized away
during compilation. By using common naming conventions in conjunction with
these new partial methods and partial classes, introduced in Visual Studio
2005, LINQ to SQL allows application developers to provide custom business logic
when using generated code. Using partial classes allows developers the
flexibility to add methods, non-persistent members, etc., to the generated LINQ
to SQL object classes. These partial methods can add logic for insert, update,
and delete by simply implementing the associated partial method. Similarly,
developers can use the same concepts to implement partial methods that hook up
eventing in the most common scenarios, for example OnValidate, OnStatusChanging
or OnStatusChanged. Example 3 shows the use of partial methods to implement
custom validation on the Customer Phone property as it is changed.
public partial class Customer
{
partial void OnPhoneChanging(string value)
{
Regex phoneNum = new
Regex(@"^[2-9]\d{2}-\d{3}-\d{4}$");
if (phoneNum.IsMatch(value) == false)
throw new Exception("Please enter a valid Phone
Number");
}
}
Listing 3. Use of partial methods to implement custom
validation logic.
Once developers can leverage the query capabilities of LINQ and
implement business logic in partial methods, the argument for using persistent
objects becomes quite compelling. To round the story out, LINQ to SQL allows
people to model inheritance hierarchies in their classes and map them to the
database. Inheritance, an important feature of object-oriented programming,
does not translate directly into the relational database; therefore, the
ability to map inheritance from the application into the database is very
important. LINQ to SQL supports one of the most common database inheritance
mappings, Table per Hierarchy (TPH), where multiple classes in a hierarchy are
mapped to a single table, view, stored procedure, or table valued function
using a discriminator column to determine the specific type of each
row/instance. In Figure 4, a single Product table in the database, maps the
inheritance of DiscontinuedProduct from Product. This two-class hierarchy is
mapped directly to the existing Northwind database using the discriminator
column “Discontinued”.
.jpg)
Figure 4. Inheritance in LINQ to SQL
Notice that the properties of the inheritance (right-click on the
relationship arrow) specify the discriminatory property and the associated
discriminator values, while the properties of DiscontinuedProduct specify the
base class from which it inherits.
LINQ to SQL has been developed with a minimally intrusive object
model, allowing developers to choose whether or not to make use of generated
code. Rather, they can create their own classes, which do not need to be
derived from any specific base class. This means that you can create classes
that inherit from no base class or from a custom base class.
As with any application framework, developers must also have the
ability to optimize the solution to best fit their scenario. LINQ to SQL offers
a number of opportunities to optimize, including using load options to control
database trips and compiled queries to amortize the overhead inherent in SQL
generation. By default, LINQ to SQL enables Object Tracking, which controls the
automatic change tracking and identity management of objects retrieved from the
database. In some scenarios, specifically where you are accessing the data in a
read-only manner, you may wish to disable Object Tracking as a performance
optimization. The options for specialization of your application behavior is
not local to just the API, the LINQ to SQL Designer also allows additional
functionality for you to expose stored procedures and/or table valued functions
as strongly typed methods on the generated DataContext, and map inserts,
updates, and deletes to stored procedures if you choose not to use dynamic SQL.
Developers who are concerned about query performance can leverage
the compiled queries capabilities which offer an opportunity to optimize query
performance. In many applications you might have code that repeatedly executes
the same query, possibly with different argument values. By default, LINQ to
SQL parses the language expression each time to build the corresponding SQL
statement, regardless of whether that expression has been seen previously.
Compiled queries, like that seen in Listing 4, allow LINQ to SQL to avoid
reparsing the expression and regenerating the SQL statement for each repeated
query.
var customers = CompiledQuery.Compile(
(NorthwindDataContext context,
string filterCountry) =>
from c in context.Customers
where c.Orders.Count > 5
select c);
NorthwindDataContext db = new NorthwindDataContext();
Console.WriteLine("Customers in the USA: ");
foreach (var row in customers(db, "USA"))
{
Console.WriteLine(row.CompanyName);
}
Console.WriteLine();
Console.WriteLine("Customers in Spain: ");
foreach (var row in customers(db, "Spain"))
{
Console.WriteLine(row.CompanyName);
}
Listing 4. An example of a simple compiled query, executed
twice with varying parameters.
The primary scenario for using LINQ to SQL is when building
applications with a rapid development cycle and a simple one-to-one object to
relational mapping against the Microsoft SQL Server family of databases. In
other words, when building an application whose object model is structured very
similarly to the existing database structure, or when a database for the
application does not yet exist and there is no predisposition against creating
a database schema that mirrors the object model; you can use LINQ to SQL to map
a subset of tables directly to classes, with the required columns from each
table represented as properties on the corresponding class. Usually in these
scenarios, the database has not and/or will not be heavily normalized.
| I want to | LINQ to SQL is applicable |
| Use an ORM solution and my database is 1:1 with my object model | .gif) |
| Use an ORM solution with inheritance hierarchies that are
stored in a single table | .gif) |
| Use my own plain CLR classes instead of using generated classes
or deriving from a base class or implementing an interface | .gif) |
| Leverage LINQ as the way I
write queries | .gif) |
| Use an ORM but I want
something that is very performant and where I can optimize performance
through stored procedures and compiled queries | .gif) |
Table 1. LINQ to SQL Scenarios
LINQ to Entities, like LINQ to SQL is a LINQ implementation
providing access to relational data, but with some key differences.
LINQ to Entities is, specifically, a part of the ADO.NET Entity
Framework which allows LINQ query capabilities. The Entity Framework is the
evolution of ADO.NET that allows developers to program in terms of the standard
ADO.NET abstraction or in terms of persistent objects (ORM) and is built upon
the standard ADO.NET Provider model which allows access to third party
databases. The Entity Framework introduces a new set of services around the
Entity Data Model (EDM) (a medium for defining domain models for an
application). This set of services includes the following:
·
Domain Modeling Capabilities and the ability to define
conceptual, data store agnostic models
·
Object Relational Mapping Capabilities (full CRUD and state
management scenarios)
·
Database independent Query Capabilities using LINQ or Entity SQL
More than a simple Object Relational Mapping (ORM) tool, the
ADO.NET Entity Framework and LINQ to Entities allow developers to work against
a conceptual model with a very flexible mapping and the ability to accommodate
a high degree of divergence from the underlying data store. For further
discussion of the Entity Framework and EDM, please see the Data Platform
Development Center (http://msdn.microsoft.com/data).
Figure 5 below shows a simple LINQ to Entities EDM diagram, using
the same subset of the Northwind database seen in Figure 1.
.jpg)
Figure 5. LINQ to Entities mapping diagram corresponding to the
Northwind database subset in Figure 1. Notice the directly mapped many-to-many
relationship between Employees and Territories without an intermediary table.
In this figure you can see that the object model is not mapped
directly, one-to-one, to the database, rather we have bypassed the intermediary
table used to represent the many-to-many relationship between Employees and
Territories. Although you can map many-to-many relationships in both LINQ to
SQL and LINQ to Entities, LINQ to Entities allows a direct mapping of
many-to-many relationships with no intermediary class, while LINQ to SQL
requires that an intermediary class map one-to-many to each of the classes that
are party to the many-to-many relationship.
Listing 5 below shows the same simple LINQ query used in Listing
1 for LINQ to SQL, used against the Entity Data Model shown in Figure 5.
using (NorthwindEntities nw = new NorthwindEntities())
{
var cusotmers = from c in nw.Customers
where c.City == "London"
select c;
}
Listing 5. Simple LINQ to Entities query
The similarities between the two LINQ implementations for this
simple query highlight the benefit of LINQ creating a query language that
remains consistent across data stores.
Microsoft designed the ADO.NET Entity Framework, and in turn LINQ
to Entities, to enable flexible and more complex mappings, ideal in the
scenario where it is not possible or ideal for the object model to match the
database schema. The mapping flexibility available with the ADO.NET Entity
Framework allows the database and application(s) to evolve separately and makes
development against highly normalized databases simpler and easier to
understand. When a change is made in the database schema, the application is
insulated from the change by the Entity Framework, and there is no requirement
to rewrite portions of the application, but rather the mapping files can simply
be updated to accommodate the database change. In Figures 6 & 7, a change
is made to the database splitting the Employees table to separate personal
information that is only accessible to HR and information that is available in
the Employee Address Book. If existing applications are not going to make use
of these changes we simply need to update the mapping to account for this
change.
.jpg)
Figure 6. Database Model for the new modified Northwind database
subset. In this modified database, we have split the Employees table into two
tables one containing Employee information that is only available to HR and
another containing information that is available to all employees in the
Employee Address Book. We have also used a different inheritance mapping, Table
per Subclass, which we will discuss later.
.jpg)
Figure 7. Entity Data Model Diagram for the modified Northwind
database subset shown in Figure 6.
Listing 6 shows a query against a single entity, mapped to multiple tables in the
database. The same query directly against the database would require the
knowledge that Employee information is split between two tables and a join of
those two tables in the query.
using (NorthwindModEntities nw = new
NorthwindModEntities()) {
var q = from e in nw.Employees
select e;
foreach (Employees emp in q)
{
Console.WriteLine(emp.FirstName + ", " +
emp.LastName + ", Hire Date: " + emp.HireDate.ToString());
}
}
Listing 6. Querying a single Entity mapped to two tables in
the database.
As discussed earlier in this article, LINQ to SQL allows one of
the most common inheritance scenarios to be mapped, Table per Hierarchy. LINQ
to Entities and the ADO.NET Entity Framework also allow the mapping of two
other types of inheritance. In addition to Table per Hierarchy (as supported by
LINQ to SQL), the Entity Framework supports:
·
The mapping of Table per Concrete Type, a separate table for each
class or type in the hierarchy.
·
The mapping of Table per Subclass, a hybrid approach using a
shared table for information about the base type and separate tables for
information about the derived types seen in Figure 8.
.jpg)
Figure 8. Mapping Table per Subclass inheritance with the Entity
Data Model.
Similar to LINQ to SQL, LINQ to Entities uses partial classes and
partial methods to allow update and business logic to be easily added to
generated code. Using partial classes and partial methods allows developers the
flexibility to add methods, non-persistent members, etc., to the generated
Entity Framework object classes, and for the addition of custom logic for
insert, update, and delete by simply implementing the associated partial
method. Similarly, developers can use the same concepts to implement partial
methods that hook up eventing in the most common scenarios, for example
OnStatusChanging or OnStatusChanged. Listing 7 shows the use of partial methods
to implement custom validation on the Customer Phone property as it is changed.
public partial class Customers
{
partial void OnPhoneChanging(string value)
{
Regex phoneNum = new
Regex(@"^[2-9]\d{2}-\d{3}-\d{4}$");
if (phoneNum.IsMatch(value) == false)
throw new Exception("Please enter a valid
Phone Number");
}
}
Listing 7. Use of partial methods to implement custom
validation logic.
Due to the explicit nature of LINQ to Entities, developers also
have the ability to optimize the solution to best fit their scenario. In Listing
2 for LINQ to SQL, when we queried for Customer data, Order information was not
automatically pulled into memory, but rather was only pulled into memory only
when the Order information was accessed. In LINQ to Entities, the developer has
full control over the number of database round trips by explicitly specifying
when to load such information from the database. Navigating to associated
information that has not yet been retrieved from the database will not cause an
additional database trip, but rather will only return the information if it was
explicitly requested with the first query or with a new query, as seen in Listing
8.
using (NorthwindModEntities nw = new NorthwindModEntities())
{
var q = from cus in nw.Customers
where cus.City == "London"
select cus;
// Loop through customers and print out orders since January 1,
1998
foreach (Customers customer in q)
{
Console.WriteLine(customer.CompanyName + ": " );
// Note line below to explicitly load all orders for each customer
customer.Orders.Load();
foreach (Orders order in customer.Orders.Where(o =>
o.OrderDate > new DateTime(1998, 1, 1)))
{
Console.WriteLine("\t{0},{1}",
order.OrderID, order.OrderDate);
}
}
Console.ReadLine();
}
Listing 8. Explicit loading of information to control number
of database roundtrips.
Like LINQ to SQL, LINQ to Entities enables Object Tracking by
default. In some scenarios, specifically where you are accessing the data in a
read-only manner, you may wish to disable Object Tracking as a performance
optimization.
LINQ to Entities also provides the ability to expose stored
procedures as strongly typed methods on the generated ObjectContext, and map
inserts, updates, and deletes to stored procedures if you choose not to use
dynamic SQL.
The primary scenario targeted by LINQ to Entities is a flexible
and more complex mapping scenario, often seen in the enterprise, where the
application is accessing data stored in Microsoft SQL Server or other-third
party databases.
In other words, the database in these scenarios contains a
physical data structure that could be significantly different from what you
expect your object model to look like. Often in these scenarios, the database
is not owned or controlled by the application developer(s), but rather owned by
a DBA or other third party, possibly preventing application developers from
making any changes to the database and requiring them to adapt quickly to
database changes that they may not have been aware of.
| I want to | LINQ to Entities is applicable |
| Write applications that can target different database engines
in addition to Microsoft SQL Server | .gif) |
| Define domain models for my application and use these as the
basis for my persistence layer. | .gif) |
| Use an ORM solution where my classes may be 1:1 with the
database or may have a very different structure from the database schema | .gif) |
| Use an ORM solution with
inheritance hierarchies that may have alternative storage schemes (single
table for the hierarchy, single table for each class, single table for all
data related to specific type) | .gif) |
| Leverage LINQ as the way I
write queries and have the query work in a database vendor agnostic manner. | .gif) |
| Use an ORM but I want
something that is very performant and where I can optimize performance
through stored procedures and compiled queries | .gif) |
Table 2. LINQ to Entities Scenarios