Click to Rate and Give Feedback
Related Articles

This article describes features introduced in the GDR for Visual Studio Team Server Database Edition.

Jamie Laflen and Barclay Hill

MSDN Magazine March 2009

...

Read more!

Here Charles Petzold explains several techniques for improving the performance of ItemsControls.

Charles Petzold

MSDN Magazine March 2009

...

Read more!

Ken Getz shows how the CollectionChanged event lets you reflect changes to your underlying data source in your bound data controls.

Ken Getz

MSDN Magazine December 2008

...

Read more!

Writing your JavaScript code in C# and having it converted, Eric Lippert’s programming blog, and reading up on T-SQL 2008.

Scott Mitchell

MSDN Magazine January 2009

...

Read more!

New spatial data support in SQL Server 2008 opens the door to mapping and querying geometric and geographic data, allowing you to build exciting new applications.

Bob Beauchemin

MSDN Magazine February 2009

...

Read more!

Also by this Author

With the Entity Framework, developers are given more flexibility by being able to design around a conceptual model rather than a relational data model. To get started, you need to know how to design an Entity Data Model. John Papa walks you through that process.

John Papa

MSDN Magazine February 2008

...

Read more!

ADO.NET Data Services and Silverlight make a powerful combination, but to make them work well together, there are a few things you need to understand. Here, John Papa explains.

John Papa

MSDN Magazine April 2009

...

Read more!

Silverlight 2 applications are restricted to running inside a browser. However, Silverlight 3 applications can run inside the browser or out. Here we build a social networking app as a standalone Silverlight 3 application.

John Papa

MSDN Magazine June 2009

...

Read more!

LINQ to Objects and LINQ to Entities have an arsenal of Standard Query Operators that operate on sequences to perform a wide variety of operations.

John Papa

MSDN Magazine March 2008

...

Read more!

The Entity Framework is a hot topic, but how do you know when to use EntityClient, Object Services, Entity SQL, or LINQ? Find out here.

John Papa

MSDN Magazine May 2008

...

Read more!

Popular Articles

Paul DiLascia

MSDN Magazine August 2002

...

Read more!

One-time passwords offer solutions to dictionary attacks, phishing, interception, and lots of other security breaches. Here's how it all works.

Dan Griffin

MSDN Magazine May 2008

...

Read more!

When incorporating the ASP.NET DataGrid control into your Web apps, common operations such as paging, sorting, editing, and deleting data require more effort than you might like to expend. But all that is about to change. The GridView control--the successor to the DataGrid-- extends the DataGrid's functionality it in a number of ways. First, it fully supports data source components and can automatically handle data operations, such as paging, sorting, and editing, as long as its bound data source object supports these capabilities. In addition, ...

Read more!

Writing a Web application with ASP.NET is unbelievably easy. So many developers don't take the time to structure their applications for great performance. In this article, the author presents 10 tips for writing high-performance Web apps. The discussion is not limited to ASP.NET applications because they are just one subset of Web applications.

Rob Howard

MSDN Magazine January 2005

...

Read more!

Jeff Prosise explains when it's better to use UpdatePanel and when it's better to use asynchronous calls to WebMethods or page methods instead.

Jeff Prosise

MSDN Magazine June 2007

...

Read more!

Data Points
Common Table Expressions
John Papa

Code download available at: DataPoints2007_10.exe (150 KB)
Browse the Code Online
Many projects that developers work on involve writing complex SQL statements that deviate from basic SELECT/FROM/WHERE types of statements. One such scenario involves writing Transact-SQL (T-SQL) queries that use derived tables (also known as inline views) inside a FROM clause. This common practice allows a developer to grab a rowset and immediately join that rowset to other tables, views, and user-defined functions in a SELECT statement. Another option is to use a view instead of a derived table. Both of these options have their advantages and disadvantages.
When working with SQL Server™ 2005, I prefer a third option of using Common Table Expressions (CTEs). CTEs can help improve the readability (and thus the maintainability) of the code without compromising performance. They also make writing recursive code in T-SQL significantly easier than it was in previous versions of SQL Server.
In this month's column, I am focusing on using CTEs to address common development scenarios. I will begin by describing how CTEs work and what scenarios they can be used to address. Then I will discuss the advantages of using CTEs versus using traditional T-SQL constructs, such as derived tables, views, and custom procedures. Throughout the column, I will give examples and explain how and where they can be used. I will also demonstrate how CTEs handle recursive logic and define how a recursive CTE operates. Note that all the code I discuss in this column is available for download from the MSDN® Magazine Web site and uses the Northwind and the AdventureWorks databases that come with SQL Server 2005.

Views, Derived Tables, and CTEs
CTEs can be useful when queries need to select from a set of data that does not exist as a table within the database. For example, you might want to write a query against a set of aggregated data that calculates values based on customers and their orders. The aggregated data might join the Customers, Orders, and Order Details tables together and calculate the sum and average values of the orders. And you may want to query against that aggregated rowset. One solution is to create a view that gathers the aggregated data first and then write a query against that view. Another option is to query against the aggregated data using a derived table. You can do this by moving the SQL statement into the FROM clause and querying against it.
Views are generally used to break down large queries so they can be queried against in a much more readable manner. For instance, a view can represent a SELECT statement that joins 10 tables together, selects dozens of columns, and filters out rows based on an involved set of logic. The view can then be queried against by other SELECT statements throughout the database. This abstraction provides easier access to the rowset being represented by the view, and there's no need to duplicate the data or store it in a temp table.
The view can also be reused throughout the database, assuming permissions allow this. In Figure 1, for example, a view is created and then used by another T-SQL statement. However, in situations where you want to gather data and only use it a single time, views may not be the best solution. Since a view is a database object that exists and is available throughout the database to all batches, creating a view that is only used in a single T-SQL batch is overkill.
CREATE VIEW vwMyView AS
SELECT
    EmployeeID, COUNT(*) AS NumOrders,  MAX(OrderDate) AS MaxDate
FROM Orders
GROUP BY EmployeeID
GO

SELECT 
    e.EmployeeID, oe.NumOrders, oe.MaxDate, e.ReportsTo AS ManagerID, 
    om.NumOrders, om.MaxDate
FROM 
    Employees AS e
    INNER JOIN vwMyView AS oe ON e.EmployeeID = oe.EmployeeID
    INNER JOIN vwMyView AS om ON e.ReportsTo = om.EmployeeID

Another option is to create a derived table (also known as an inline view). A derived table can be created by simply moving a SELECT statement inside of a FROM clause surrounded by parenthesis. It can then be queried against or joined to just like a table or view. The code in Figure 2 solves the same query that Figure 1 solves, but instead of using a view it uses derived tables. While derived tables are only accessible within the statement in which they exist, tables generally make a query more difficult to read and thus to maintain. This problem can be multiplied if you want to use the derived table multiple times within the same batch, as you must then copy and paste the derived table to reuse it.
SELECT 
    e.EmployeeID, oe.NumOrders, oe.MaxDate,  e.ReportsTo AS ManagerID, 
    om.NumOrders, om.MaxDate
FROM 
    Employees AS e
    INNER JOIN 
        (SELECT EmployeeID, COUNT(*), MAX(OrderDate)
         FROM Orders
         GROUP BY EmployeeID) AS oe(EmployeeID, NumOrders, MaxDate)
        ON e.EmployeeID = oe.EmployeeID
    LEFT JOIN 
        (SELECT EmployeeID, COUNT(*), MAX(OrderDate)
         FROM Orders
         GROUP BY EmployeeID) AS om(EmployeeID, NumOrders, MaxDate)
        ON e.ReportsTo = om.EmployeeID

A CTE is a nice fit for this type of scenario since it makes the T-SQL much more readable (like a view), yet it can be used more than once in a query that immediately follows in the same batch. Of course, it is not available beyond that scope. Additionally, the CTE is a language-level construct—meaning that SQL Server does not internally create temp or virtual tables. The CTE's underlying query will be called each time it is referenced in the immediately following query.
Therefore, this same scenario can also be written using a CTE, as shown in Figure 3. The EmpOrdersCTE gathers the aggregated data and is then used by the query that immediately follows the CTE. Using the CTE, the code in Figure 3 makes the query very readable (like a view), yet does not create a system object to store the metadata.
;WITH EmpOrdersCTE (EmployeeID, NumOrders, MaxDate) AS
(
  SELECT EmployeeID, COUNT(*), MAX(OrderDate)
  FROM Orders
  GROUP BY EmployeeID
)

SELECT 
    e.EmployeeID,  oe.NumOrders, oe.MaxDate,
    e.ReportsTo AS ManagerID,  om.NumOrders, om.MaxDate
FROM 
    Employees AS e
    INNER JOIN EmpOrdersCTE oe ON e.EmployeeID = oe.EmployeeID
    LEFT JOIN EmpOrdersCTE om ON e.ReportsTo = om.EmployeeID


Structure of a CTE
Now I'll demonstrate how to construct a CTE using a simple CTE as an example. A CTE begins with the WITH keyword. However, if the CTE is not the first statement in the batch, you must precede the WITH keyword with a semicolon. As a best practice, I prefer to prefix all of my CTEs with a semicolon—I find this consistent approach easier than having to remember whether I need a semicolon or not.
The WITH keyword is followed by the name of the CTE, which is then followed by an optional list of column aliases. The column aliases correspond to the columns that are returned by the SELECT statement inside the CTE. Following the optional column aliases is the AS keyword—this is required. And the AS keyword is then followed by the query expression that defines the CTE enclosed within parenthesis.
Take a look at this example:
;WITH myCTE (CustID, Co) AS
(
  SELECT CustomerID, CompanyName FROM Customers
)
SELECT CustID, Co FROM myCTE
The CustomerID and CompanyName columns are aliased with CustID and Co. Then immediately following the CTE is a SELECT statement that references the CTE using its column aliases.

Understanding CTEs
Before you design a CTE, you must understand how it works and what rules it follows. This section describes where CTEs can be used as well as what can and cannot be used inside a CTE. For starters, CTEs can be created and used inside of a T-SQL batch, a user-defined function, a stored procedure, a trigger, or a view.
A CTE can only be referenced by the statement that immediately follows the CTE. This means that if you want to use a CTE, you must write the query that refers to the CTE immediately after the CTE in the T-SQL batch. For example, the following batch will produce an error:
;WITH myCTE (CustID, Co) AS
(
  SELECT CustomerID, CompanyName FROM Customers
)
SELECT CompanyName FROM Customers WHERE CustomerID = 'ALFKI'
SELECT CustID, Co FROM myCTE
In this code, myCTE is only available for the first query that follows it. When the second query refers to myCTE, the CTE is not in scope and an exception is thrown (Invalid object name 'myCTE').
Also note that since a CTE is intended to be referred to by another query, which may then reprocess the data anyway, a CTE's query cannot contain statements such as ORDER and COMPUTE. However, complex statements such as FOR XML can still be used to define and operate on a CTE. For example, you could query a CTE and return its results using the FOR XML clause, as shown here.
;WITH myCTE AS
(
  SELECT c.CustomerID, c.CompanyName, o.OrderID, o.OrderDate
  FROM Customers c 
  INNER JOIN Orders o ON c.CustomerID = o.CustomerID
)

SELECT CustomerID, CompanyName, OrderID, OrderDate 
FROM myCTE FOR XML AUTO
Once a CTE is defined, it can be referenced multiple times by the first query that follows it. This is especially useful when a query needs to refer to the CTE more than once. The code sample in Figure 3 demonstrates how the EmpOrdersCTE is referred to twice by the query so it can grab the emplo yees and their supervisors. This is very useful when you need to reference the same rowset more than once; it is simpler to refer to the CTE twice than it is to duplicate the query.
A CTE does not have to be used by a SELECT statement; it is available for use by any statement that refers to the rowset that the CTE generates. This means a CTE can be followed by a SELECT, INSERT, UPDATE, or DELETE statement that uses the CTE. You can also use forward only and snapshot cursors on queries that use CTEs.
Furthermore, a CTE can be followed by another CTE. This technique can be used to build a CTE off of another CTE when you want to gather intermediate results into a rowset. When creating a CTE that builds off another CTE, you separate the CTE definitions with a comma.
The example shown in Figure 4 defines the EmpOrdersCTE, which gathers a list of employees and the total number of orders for each employee. The second CTE, named MinMaxOrdersCTE, queries the first EmpOrdersCTE and performs the aggregate functions on the rowset to determine the average, minimum, and maximum number of orders for the employees.
;WITH 
EmpOrdersCTE (EmployeeID, NumOrders)
AS
(
  SELECT EmployeeID, COUNT(*)
  FROM Orders
  GROUP BY EmployeeID
),
MinMaxOrdersCTE (Mn, Mx, Diff)
AS
(
  SELECT MIN(NumOrders), MAX(NumOrders), AVG(NumOrders)
  FROM EmpOrdersCTE
)
SELECT Mn, Mx,    Diff
FROM MinMaxOrdersCTE

Multiple CTEs can be defined after the WITH keyword by separating them with a comma. Each CTE can be referenced by the CTE that follows it, building off of each other. The data manipulation language (DML) statement that follows the CTE definitions can also refer to any of the CTEs defined within the WITH clause.

Rules of Recursion
CTEs can also be used to implement recursive algorithms. Recursive logic is useful when you need to write an algorithm that calls itself—this is often done to traverse a nested set of data. Writing recursive logic can be complex, especially in a language like T-SQL. However, this is one of the specific problems that CTEs were designed to address. The basic formula for creating a recursive CTE is as follows:
  1. Create the query that returns the top level (this is the anchor member).
  2. Write a recursive query (this is the recursive member).
  3. UNION the first query with the recursive query.
  4. Make sure you have a case where no rows will be returned (this is your termination check).
Your recursive CTE will look something like this:
;WITH myRecursiveCTE(col1, col2, ... coln) AS
( 
  -- Anchor Member Query
  UNION ALL
  -- Recursive Member Query that references myRecursiveCTE
)
When writing a custom recursion procedure that does not involve CTEs, you must include an explicit termination clause. The termination clause is responsible for making sure that the recursive algorithm eventually terminates and bubbles back up the recursive call stack. Without this clause, your code could end up in an infinite loop.
CTEs, however, have two aspects that help with handling termination clauses. The first is an implicit termination clause that occurs when the recursive member returns zero records. When this happens, the recursive member query does not invoke the CTE recursively but instead bubbles back up the call stack. The other is that you can explicitly set the MAXRECURSION level.
The MAXRECURSION level can be set explicitly in a batch that contains the CTE or through a server-side setting (the server-wide setting defaults to 100 unless you change it). This setting limits the number of times a CTE can recursively call itself. If the limit is reached, an exception is thrown. The syntax for setting the MAXRECURSION level is to use the OPTION clause in your SELECT statement that follows the CTE, like so:
-- DEFINE YOUR CTE HERE
SELECT * FROM EmpCTE
OPTION (MAXRECURSION 7)
There are some other rules you should also keep in mind when designing recursive CTEs. A recursive CTE must contain both an anchor member and a recursive member. Both members must have the same number of columns and the columns belonging to both members must have matching datatypes. The recursive member can only refer to the CTE once and the members cannot use the following clauses or keywords:
  • SELECT DISTINCT
  • GROUP BY
  • HAVING
  • TOP
  • LEFT/RIGHT OUTER JOIN

Recursion Excursion
In terms of data and rowsets, recursion is used to resolve a problem when you need to perform the same logic repeatedly against the same set of data, under different terms. For example, say you are asked to find all of the salespeople and who they work for and return the data in hierarchical order. Figure 5 demonstrates a solution using a CTE that uses recursion to gather the list of employees that work for the VP of Sales.
;WITH EmpCTE(EmployeeID, EmployeeFirstName,
 EmployeeLastName, MgrID, SalesLevel)
AS
( 
  -- Anchor Member 
  SELECT EmployeeID, FirstName, LastName, ReportsTo, 0
  FROM Employees
  WHERE EmployeeID = 2 -- Start with the VP of Sales

  UNION ALL
  
  -- Recursive Member 
  SELECT 
    e.EmployeeID, e.FirstName, e.LastName, e.ReportsTo, m.SalesLevel+1
  FROM 
    Employees AS e
    INNER JOIN EmpCTE m ON e.ReportsTo = m.EmployeeID
)

 -- Using the CTE
SELECT EmployeeID, EmployeeFirstName, EmployeeLastName, 
       MgrID, SalesLevel  
FROM EmpCTE

Except for a few additional aspects, the recursive CTE shown in Figure 5 looks very much like a standard CTE. Where a standard CTE contains a query that defines a rowset, a recursive CTE defines two query definitions. The first query definition, the anchor member, defines a query that will be executed when the CTE is called. The second query definition, the recursive member, defines a query that returns the same columns and datatypes as the anchor member. The recursive member also retrieves the values that will then be used to call back into the CTE recursively. The results of the queries are pulled together using a UNION statement.
The EmpCTE in Figure 5 shows an anchor member that grabs the employee record for the VP of Sales (EmployeeID = 2). The last column of the anchor member's query returns a value of 0, which represents the 0th level of the hierarchy, which is the top. The recursive member's query grabs the list of employees that report to the previous employee. This is accomplished by joining the Employees table to the EmpCTE.
The same columns are retrieved from the recursive member, but the SalesLevel column is calculated by taking the current employee's supervisor, grabbing the supervisor's SalesLevel, and incrementing it by 1. The expression m.SalesLevel+1 assigns a SalesLevel of 1 to all people who report directly to the VP of Sales (retrieved from our anchor member). Any employees that report to those people then have a SalesLevel of 2. The SalesLevel continues to increment in this way for each subsequent level of the sales organization's hierarchy.

Wrapping Up
CTEs provide a way to make writing T-SQL much more readable when compared to scenarios that use complex derived tables within a query or referencing a view whose definition is external to the T-SQL batch. CTEs also provide a much improved tool to address the struggles involved in using recursive algorithms. Whether you are using non-recursive or recursive CTEs, you'll find CTEs can help you address many common development scenarios and improve readability without sacrificing performance.

Send your questions and comments for John to mmdata@microsoft.com.


John Papa is a Senior .NET Consultant with ASPSOFT (aspsoft.com) and a baseball fanatic who spends most of his summer nights rooting for the Yankees with his family and his faithful dog, Kadi. John, a C# MVP, has authored several books on ADO, XML, and SQL Server. He can often be found speaking at industry conferences, such as VSLive, or blogging at codebetter.com/blogs/john.papa.

Page view tracker