Query Processing Enhancements on Partitioned Tables and Indexes

SQL Server 2008 improves query processing performance on partitioned tables for many parallel plans, changes the way parallel and serial plans are represented, and enhances the partitioning information provided in both compile-time and run-time execution plans. This topic describes these improvements, provides guidance on how to interpret the query execution plans of partitioned tables and indexes, and provides best practices for improving query performance on partitioned objects.

Note

Partitioned tables and indexes are supported only in the SQL Server Enterprise, Developer, and Evaluation editions.

New Partition-Aware Seek Operation

In SQL Server 2008, the internal representation of a partitioned table is changed so that the table appears to the query processor to be a multicolumn index with PartitionID as the leading column. PartitionID is a hidden computed column used internally to represent the ID of the partition containing a specific row. For example, assume the table T, defined as T(a, b, c), is partitioned on column a, and has a clustered index on column b. In SQL Server 2008, this partitioned table is treated internally as a nonpartitioned table with the schema T(PartitionID, a, b, c) and a clustered index on the composite key (PartitionID, b). This allows the query optimizer to perform seek operations based on PartitionID on any partitioned table or index.

Partition elimination is now done in this seek operation.

In addition, the query optimizer is extended so that a seek or scan operation with one condition can be done on PartitionID (as the logical leading column) and possibly other index key columns, and then a second-level seek, with a different condition, can be done on one or more additional columns, for each distinct value that meets the qualification for the first-level seek operation. That is, this operation, called a skip scan, allows the query optimizer to perform a seek or scan operation based on one condition to determine the partitions to be accessed and a second-level index seek operation within that operator to return rows from these partitions that meet a different condition. For example, consider the following query.

SELECT * FROM T WHERE a < 10 and b = 2;

For this example, assume that table T, defined as T(a, b, c), is partitioned on column a, and has a clustered index on column b. The partition boundaries for table T are defined by the following partition function:

CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (3, 7, 10);

To solve the query, the query processor performs a first-level seek operation to find every partition that contains rows that meet the condition T.a < 10. This identifies the partitions to be accessed. Within each partition identified, the processor then performs a second-level seek into the clustered index on column b to find the rows that meet the condition T.b = 2 and T.a < 10.

The following illustration is a logical representation of the skip scan operation. It shows table T with data in columns a and b. The partitions are numbered 1 through 4 with the partition boundaries shown by dashed vertical lines. A first-level seek operation to the partitions (not shown in the illustration) has determined that partitions 1, 2, and 3 meet the seek condition implied by the partitioning defined for the table and the predicate on column a. That is, T.a < 10. The path traversed by the second-level seek portion of the skip scan operation is illustrated by the curved line. Essentially, the skip scan operation seeks into each of these partitions for rows that meet the condition b = 2. The total cost of the skip scan operation is the same as that of three separate index seeks.

Shows the skip scan operation.

Displaying Partitioning Information in Query Execution Plans

The execution plans of queries on partitioned tables and indexes can be examined by using the Transact-SQL SET statements SET SHOWPLAN_XML or SET STATISTICS XML, or by using the graphical execution plan output in SQL Server Management Studio. For example, you can display the compile-time execution plan by clicking Display Estimated Execution Plan on the Query Editor toolbar and the run-time plan by clicking Include Actual Execution Plan.

Using these tools, you can ascertain the following information:

  • The operations such as scans, seeks, inserts, updates, merges, and deletes that access partitioned tables or indexes.

  • The partitions accessed by the query. For example, the total count of partitions accessed and the ranges of contiguous partitions that are accessed are available in run-time execution plans.

  • When the skip scan operation is used in a seek or scan operation to retrieve data from one or more partitions.

For more information about displaying execution plans, see Execution Plan How-to Topics.

Partition Information Enhancements

SQL Server 2008 provides enhanced partitioning information for both compile-time and run-time execution plans. Execution plans now provide the following information:

  • An optional Partitioned attribute that indicates that an operator, such as a seek, scan, insert, update, merge, or delete, is performed on a partitioned table.

  • A new SeekPredicateNew element with a SeekKeys subelement that includes PartitionID as the leading index key column and filter conditions that specify range seeks on PartitionID. The presence of two SeekKeys subelements indicates that a skip scan operation on PartitionID is used.

  • Summary information that provides a total count of the partitions accessed. This information is available only in run-time plans.

To demonstrate how this information is displayed in both the graphical execution plan output and the XML Showplan output, consider the following query on the partitioned table fact_sales. This query updates data in two partitions.

UPDATE fact_sales

SET quantity = quantity * 2

WHERE date_id BETWEEN 20080802 AND 20080902;

The following illustration shows the properties of the Clustered Index Seek operator in the compile-time execution plan for this query. To view the definition of the fact_sales table and the partition definition, see "Example" in this topic.

Partition information in the Showplan output.

Partitioned Attribute

When an operator such as an Index Seek is executed on a partitioned table or index, the Partitioned attribute appears in the compile-time and run-time plan and is set to True (1). The attribute does not display when it is set to False (0).

The Partitioned attribute can appear in the following physical and logical operators:

  • Table Scan

  • Index Scan

  • Index Seek

  • Insert

  • Update

  • Delete

  • Merge

As shown in the previous illustration, this attribute is displayed in the properties of the operator in which it is defined. In the XML Showplan output, this attribute appears as Partitioned="1" in the RelOp node of the operator in which it is defined.

New Seek Predicate

In XML Showplan output, the SeekPredicateNew element appears in the operator in which it is defined. It can contain up to two occurrences of the SeekKeys sub-element. The first SeekKeys item specifies the first-level seek operation at the partition ID level of the logical index. That is, this seek determines the partitions that must be accessed to satisfy the conditions of the query. The second SeekKeys item specifies the second-level seek portion of the skip scan operation that occurs within each partition identified in the first-level seek.

Partition Summary Information

In run-time execution plans, partition summary information provides a count of the partitions accessed and the identity of the actual partitions accessed. You can use this information to verify that the correct partitions are accessed in the query and that all other partitions are eliminated from consideration.

The following information is provided: Actual Partition Count, and Partitions Accessed.

Actual Partition Count is the total number of partitions accessed by the query.

Partitions Accessed, in XML Showplan output, is the partition summary information that appears in the new RuntimePartitionSummary element in RelOp node of the operator in which it is defined. The following example shows the contents of the RuntimePartitionSummary element, indicating that two total partitions are accessed (partitions 2 and 3).

<RunTimePartitionSummary>

    <PartitionsAccessed PartitionCount="2" >

        <PartitionRange Start="2" End="3" />

    </PartitionsAccessed>

</RunTimePartitionSummary>

Displaying Partition Information by Using Other Showplan Methods

The Showplan methods SHOWPLAN_ALL, SHOWPLAN_TEXT and STATISTICS PROFILE do not report the partition information described in this topic, with the following exception. As part of the SEEK predicate, the partitions to be accessed are identified by a range predicate on the computed column representing the partition ID. The following example shows the SEEK predicate for a Clustered Index Seek operator. Partitions 2 and 3 are accessed, and the seek operator filters on the rows that meet the condition date_id BETWEEN 20080802 AND 20080902.

|--Clustered Index Seek(OBJECT:([db_sales_test].[dbo].[fact_sales].[ci]),

        SEEK:([PtnId1000] >= (2) AND [PtnId1000] <= (3)

                AND [db_sales_test].[dbo].[fact_sales].[date_id] >= (20080802)

                AND [db_sales_test].[dbo].[fact_sales].[date_id] <= (20080902))

                ORDERED FORWARD)

Interpreting Execution Plans for Partitioned Heaps

In SQL Server 2008, a partitioned heap is treated as a logical index on the partition ID. Partition elimination on a partitioned heap is represented in an execution plan as a Table Scan operator with a SEEK predicate on partition ID. The following example shows the Showplan information provided:

|-- Table Scan (OBJECT: ([db].[dbo].[T]), SEEK: ([PtnId1001]=[Expr1011]) ORDERED FORWARD)

Interpreting Execution Plans for Collocated Joins

Join collocation can occur when two tables are partitioned using the same or equivalent partitioning function and the partitioning columns from both sides of the join are specified in the join condition of the query. The query optimizer can generate a plan where the partitions of each table that have equal partition IDs are joined separately. Collocated joins can be faster than non-collocated joins because they can require less memory and processing time. The optimizer chooses a non-collocated plan or a collocated plan based on cost estimates.

In a collocated plan, the Nested Loops join reads one or more joined table or index partitions from the inner side. The numbers within the Constant Scan operators represent the partition numbers.

When parallel plans for collocated joins are generated for partitioned tables or indexes, a Parallelism operator appears between the Constant Scan and the Nested Loops join operators. In this case, multiple threads on the outer side of the join each read and work on a different partition.

The following illustration demonstrates a parallel query plan for a collocated join.

Co-located Join Execution Plan

Parallel Query Execution Strategy for Partitioned Objects

The query processor uses a parallel execution strategy for queries that select from partitioned objects. As part of the execution strategy, the query processor determines the table partitions required for the query and the proportion of threads to allocate to each partition. In most cases, the query processor allocates an equal or almost equal number of threads to each partition, and then executes the query in parallel across the partitions. The following paragraphs explain thread allocation in greater detail.

If the number of threads is less than the number of partitions, the query processor assigns each thread to a different partition, initially leaving one or more partitions without an assigned thread. When a thread finishes executing on a partition, the query processor assigns it to the next partition until each partition has been assigned a single thread. This is the only case in which the query processor reallocates threads to other partitions.

Shows thread reassigned after it finishes

If the number of threads is equal to the number of partitions, the query processor assigns one thread to each partition. When a thread finishes, it is not reallocated to another partition.

Shows one thread allocated to each partition

If the number of threads is greater than the number of partitions, the query processor allocates an equal number of threads to each partition. If the number of threads is not an exact multiple of the number of partitions, the query processor allocates one additional thread to some partitions in order to use all of the available threads. Note that if there is only one partition, all threads will be assigned to that partition. In the diagram below, there are four partitions and 14 threads. Each partition has 3 threads assigned, and two partitions have an additional thread, for a total of 14 thread assignments. When a thread finishes, it is not reassigned to another partition.

Shows multiple threads allocated to the partitions

Although the above examples suggest a straightforward way to allocate threads, the actual strategy is more complex and accounts for other variables that occur during query execution. For example, if the table is partitioned and has a clustered index on column A and a query has the predicate clause WHERE A IN (13, 17, 25), the query processor will allocate one or more threads to each of these three seek values (A=13, A=17, and A=25) instead of each table partition. It is only necessary to execute the query in the partitions that contain these values, and if all of these seek predicates happen to be in the same table partition, all of the threads will be assigned to the same table partition.

To take another example, suppose that the table has four partitions on column A with boundary points (10, 20, 30), an index on column B, and the query has a predicate clause WHERE B IN (50, 100, 150). Because the table partitions are based on the values of A, the values of B can occur in any of the table partitions. Thus, the query processor will seek for each of the three values of B (50, 100, 150) in each of the four table partitions. The query processor will assign threads proportionately so that it can execute each of these 12 query scans in parallel.

Table partitions based on column A

Seeks for column B in each table partition

Table Partition 1: A < 10

B=50, B=100, B=150

Table Partition 2: A >= 10 AND A < 20

B=50, B=100, B=150

Table Partition 3: A >= 20 AND A < 30

B=50, B=100, B=150

Table Partition 4: A >= 30

B=50, B=100, B=150

Best Practices

To improve the performance of queries that access a large amount of data from large partitioned tables and indexes, we recommend the following best practices:

  • Stripe each partition across many disks.

  • When possible, use a server with enough main memory to fit frequently accessed partitions or all partitions in memory to reduce I/O cost.

  • If the data you query will not fit in memory, compress the tables and indexes. This will reduce I/O cost.

  • Use a server with fast processors and as many processor cores as you can afford, to take advantage of parallel query processing capability.

  • Ensure the server has sufficient I/O controller bandwidth.

  • Create a clustered index on every large partitioned table to take advantage of B-tree scanning optimizations.

  • Follow the best practice recommendations in the white paper, "Loading Bulk Data into a Partitioned Table," when bulk loading data into partitioned tables.

Example

The following example creates a test database containing a single table with seven partitions. Use the tools described previously when executing the queries in this example to view partitioning information for both compile-time and run-time plans.

Note

This example inserts more than 1 million rows into the table. Running this example may take several minutes depending on your hardware. Before executing this example, verify that you have more than 1.5 GB of disk space available.

USE master;
GO
IF DB_ID (N'db_sales_test') IS NOT NULL
    DROP DATABASE db_sales_test;
GO
CREATE DATABASE db_sales_test;
GO
USE db_sales_test;
GO
CREATE PARTITION FUNCTION [pf_range_fact](int) AS RANGE RIGHT FOR VALUES 
(20080801, 20080901, 20081001, 20081101, 20081201, 20090101);
GO
CREATE PARTITION SCHEME [ps_fact_sales] AS PARTITION [pf_range_fact] 
ALL TO ([PRIMARY]);
GO
CREATE TABLE fact_sales(date_id int, product_id int, store_id int, 
    quantity int, unit_price numeric(7,2), other_data char(1000))
ON ps_fact_sales(date_id);
GO
CREATE CLUSTERED INDEX ci ON fact_sales(date_id);
GO
PRINT 'Loading...';
SET NOCOUNT ON;
DECLARE @i int;
SET @i = 1;
WHILE (@i<1000000)
BEGIN
    INSERT INTO fact_sales VALUES(20080800 + (@i%30) + 1, @i%10000, @i%200, RAND() * 25, (@i%3) + 1, '');
    SET @i += 1;
END;
GO
DECLARE @i int;
SET @i = 1;
WHILE (@i<10000)
BEGIN
    INSERT INTO fact_sales VALUES(20080900 + (@i%30) + 1, @i%10000, @i%200, RAND() * 25, (@i%3) + 1, '');
    SET @i += 1;
END;
PRINT 'Done.';
GO
-- Two-partition query.
SET STATISTICS XML ON;
GO
SELECT date_id, SUM(quantity*unit_price) AS total_price
FROM fact_sales
WHERE date_id BETWEEN 20080802 AND 20080902
GROUP BY date_id ;
GO
SET STATISTICS XML OFF;
GO
-- Single-partition query.
SET STATISTICS XML ON;
GO
SELECT date_id, SUM(quantity*unit_price) AS total_price
FROM fact_sales
WHERE date_id BETWEEN 20080801 AND 20080831
GROUP BY date_id;
GO
SET STATISTICS XML OFF;
GO