Execution Plans for GROUPING SETS, ROLLUP, and CUBE
This topic describes the query execution plans for queries that have multiple groupings in a single GROUP BY clause. The plans that are described are generated by the following types of GROUP BY statements:
GROUP BY...WITH ROLLUP
GROUP BY...WITH CUBE
GROUP BY..., GROUPING SETS (...), ...
GROUP BY..., CUBE(...), ...
GROUP BY..., ROLLUP(...), ...
When a GROUP BY clause is present, complex aggregates are decomposed to simple aggregates early in optimization, regardless of the kind of GROUP BY clause. The following Transact-SQL statements provide a simplified illustration of the transformation that occurs internally. In the following example, the first SELECT statement is transformed into the second:
SELECT a, AVG(x) FROM t GROUP BY a WITH ROLLUP; SELECT a, SUM(x)/COUNT(x) FROM t GROUP BY a WITH ROLLUP;
Scalar aggregates cannot be expressed by using multiple groupings syntax. Multiple groupings syntax can be used to specify a single grouping set. The following statements are the same:
SELECT a, b, c, SUM(x) FROM v53 GROUP BY GROUPING SETS ((a, b, c));
SELECT a, b, c, SUM(x) FROM v53 GROUP BY a, b, c;
The execution plans for the following statements will be the same as the execution plans for GROUP BY, except scalar aggregates cannot be expressed by using multiple groupings syntax.
The following statement, a scalar aggregate, returns one row that has the value NULL:
SELECT SUM(x) FROM v53 WHERE a = a + 1 ;
The following statements return no rows:
SELECT SUM(x) FROM v53 WHERE a = a + 1 GROUP BY GROUPING SETS (()); SELECT SUM(x) FROM v53 WHERE a = a + 1 GROUP BY(x-x);
In versions of SQL Server before SQL Server 2008, the ROLLUP operator is indistinguishable from regular stream aggregate operators in Showplan. Starting with SQL Server 2008, RollupInfo is added under the Stream Aggregate node in Showplan as a distinguishing element. <RollupInfo> appears in the XML Showplan and in the properties window, but not when you rest the pointer over the operator. A ROLLUP operator has an element called <RollupInfo>, for example:
<ROLLUPLevel Number="0" />
<ROLLUPLevel Number="2" />
<ROLLUPLevel Number="3" />
The <RollupInfo> element will be present in the XML Showplan whenever the stream aggregate operator is a ROLLUP, that is, it outputs multiple groupings. The Showplan text is unchanged.
Sample Query and Execution Plan for ROLLUP
The following sample query, Q1, returns aggregate sales for the Adventure Works Cycles for the year 2004. The sales are grouped by region and then by day. To generate an actual execution plan, in Query Editor in SQL Server Management Studio, run the following query. The plan is described in the section "Actual Execution Plan for Query Q1" that follows.
USE AdventureWorksDW; GO WITH v1 AS ( SELECT t.SalesTerritoryRegion theregion ,d.WeekNumberOfYear theweek ,d.DayNumberOfYear theday ,f.SalesAmount sales FROM dbo.FactInternetSales AS f INNER JOIN dbo.DimTime AS d ON f.OrderDateKey = d.TimeKey INNER JOIN dbo.DimSalesTerritory AS t ON f.SalesTerritoryKey = t.SalesTerritoryKey WHERE d.calendaryear = 2006 ) SELECT theregion, theweek, theday, SUM(sales) AS sumsales FROM v1 GROUP BY GROUPING SETS ((), (theregion), (theregion, theweek, theday)) OPTION (MAXDOP 1); GO
Actual Execution Plan for Query Q1
Query Q1 can be satisfied with a single ROLLUP operation. However, GROUPING SETS queries frequently require multiple ROLLUP operations. (The results of these operations are combined by UNION.)
The graphical execution plan for Q1 has two Stream Aggregate operators in a row. The Stream Aggregate on the left is a ROLLUP operator and has the <RollupInfo> element. The information in the <RollupInfo> element completely describes the behavior of the ROLLUP operation. There are three grouping elements in the Stream Aggregate that are listed in the GroupBy element; therefore, HighestLevel = 3. There are four possible ROLLUPLevels: 0 through HighestLevel. In this example, the highest is 3. The zero-level ROLLUP is the base grouping, (theregion, theweek, theday). This is what would be output by a typical Stream Aggregate in this position in the plan. Each higher level (level L) is a grouping on the prefix of the GROUP BY list of size HighestLevel-L. Level 1 represents (theregion, theweek) and is not required in this case. Levels 2 and 3 here indicate output groupings (theregion) and (), respectively.
Single ROLLUP Plans
The previous execution plan for query Q1 is typical for a single ROLLUP operation:
There is an input query that corresponds to the FROM and WHERE clauses.
There is a Stream Aggregate computing GROUP BY a, b, c.
This aggregate can be local or global and partitioned parallel.
This aggregate might be preceded by a sort.
This aggregate might be matched to an indexed view.
Query processor then filters and computes scalars for the HAVING clause and SELECT clause <select> list.
You can obtain a single ROLLUP plan in the following ways:
Use the ROLLUP keyword:
SELECT a, b, c, SUM(x) FROM v53 GROUP BY ROLLUP(a, b, c); SELECT a, b, c, SUM(x) FROM v53 GROUP BY a, b, c WITH ROLLUP;
Use any other multiple groupings syntax to specify a ROLLUP operation or part of a ROLLUP operation, with two or more groupings. SQL Server analyzes the groupings to see whether they are covered by a single ROLLUP operation, that is, they form a chain of containing sets.
SELECT a, b, c, SUM(x) FROM v53 GROUP BY GROUPING SETS ((), (a), (a, b, c));
A single dimension cube is a ROLLUP operation:
SELECT a, SUM(x) FROM v53 GROUP BY a WITH CUBE;
Aggregate First and Then ROLLUP
In SQL Server, a ROLLUP operator is always implemented as a separate operator, in addition to an ordinary GROUP BY. This is not strictly necessary. Consider the following code example.
SELECT MAX(x) z1, COUNT(x) z2 FROM v53 GROUP BY a, b WITH ROLLUP;
This query is processed in two steps. Notice that a max of multiple base aggregation max is a max, and a sum of multiple base aggregation count is a count:
'base aggregation': compute y1 = max(x), y2 = count(x) Rollup: compute z1 = max(y1), z2 = sum(y2)
The benefit of always splitting off the base aggregation is that it can be matched to an indexed view, parallelized, or pushed to follow a join.
Multiple ROLLUP Plans
If the GROUPING SETS are not covered by a ROLLUP operator, SQL Server will build an almost minimal set of ROLLUP operations that, when the set is combined with UNION ALL, cover the GROUPING SETS. The following query requires three ROLLUP operations to cover GROUPING SETS:
SELECT a, b, c, SUM(x) FROM v53 GROUP BYGROUPING SETS ((), (a), (b), (c), (a, b), (a, c), (b,c));
The following is one possible combination of ROLLUP operations:
ROLLUP 1 covers (), (a), (a, b) ROLLUP 2 covers (c), (a, c) ROLLUP 3 covers (b), (b, c)
Single ROLLUP Plan
As described earlier in the section "Actual Execution Plan for Query Q1," a single ROLLUP plan has two aggregate operators in sequence. The stream or hash aggregate on the right is the base aggregation. The second stream aggregate on the left computes the ROLLUP operator.
This section provides sample code that you can run to generate sample actual execution plans for the CUBE operator.
Sample Query and Execution Plan for CUBE
To generate an actual execution plan, in Query Editor in Management Studio, run the following query, Q2.
USE AdventureWorksDW2008R2; GO SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey ,SUM (SalesAmount) FROM dbo.FactInternetSales GROUP BY CUBE (ProductKey, OrderDateKey, DueDateKey, ShipDateKey); GO
Actual Execution Plan for Query Q2
The top line of the Actual Execution Plan contains the input. The input is sorted and passed to the base aggregation. This is spooled and read by each branch. Each branch computes a ROLLUP operation that is preceded by an optional sort.
This query contains one more column than query Q3 that follows. It is more cost-effective to spool the base aggregate instead of rereading the input on each branch. Queries that have more branches will favor the base aggregate plans.
Sample and Query Execution Plan for CUBE Without Base Aggregation or Spool
To generate an actual execution plan, in Query Editor in Management Studio, run the following query, Q3.
USE AdventureWorksDW2008R2; GO SELECT ProductKey, OrderDateKey, DueDateKey, SUM (SalesAmount) FROM dbo.FactInternetSales GROUP BY CUBE (ProductKey, OrderDateKey, DueDateKey); GO
Actual Execution Plan for Query Q3
This query creates a different style of multiple groupings plan that is basically the UNION ALL solution. In this plan, there is no base aggregation or spool. The grouping sets are divided into several single ROLLUP plans that are concatenated.
Sample and Query Execution Plan for Naïve CUBE
To generate an actual execution plan, in Query Editor in Management Studio, run the following query, Q4.
USE AdventureWorksDW2008R2; GO SELECT ProductKey, OrderDateKey, DueDateKey, SUM (DISTINCT SalesAmount) FROM dbo.FactInternetSales GROUP BY CUBE (ProductKey, OrderDateKey, DueDateKey); GO
Actual Execution Plan for Query Q4
This query generates a naïve CUBE Actual Execution Plan that has multiple groupings and any distinct aggregates.
Each branch computes a single grouping. There is no base aggregation. This plan is not very efficient. If there are multiple aggregates in the SELECT <select> list, some of which are distinct, each branch can have a join and spool. This is similar to a single grouping over distinct aggregates.