October 2014

Volume 29 Number 10


SQL Server : Implement Large, Fast, Updatable Tables for Responsive Real-Time Reporting

Don Mackenzie | October 2014

Satisfying the needs of business users is an ongoing challenge for system designers. This article builds on the October MSDN Magazine print article, “Use Updatable Tables for Responsive Real-Time Reporting,” that describes a combination of table partitioning and columnstore indexing to support a large table you can query rapidly for reporting and update concurrently. The example is 500 million rows with 25 columns. This companion article provides code and step-by-step instructions for implementing such a system.

It’s advisable to read the other article (msdn.magazine.com/magazine/dn802606) first to understand the overall strategy and reasoning. It includes cautions and references to other sources and some discussion of new features in SQL Server 2014. This article provides code you can easily modify for your own implementation. Both articles and the code here use my Revenue table as an example. Much of the code can be reused directly if you replace “Revenue” with your own table name.

The example application includes a partition function, a partition scheme, three tables, two indexes and two stored procedures. The option to merge partitions adds another partition function, partition scheme, table and stored procedures. There are several moving parts and the implementation can be somewhat fragile. Proceed carefully and thoroughly consider processing, backup and recovery procedures.

Assemble the Elements

Any partitioned table requires a partition function and a partition scheme that refers to the function. Then you can create the table that refers to the scheme. I’ll create a partition function, a partition scheme and the Revenue table. I’ll also create two other tables—RevenueIn and RevenueOut—that I’ll use to refresh data in the main Revenue table.

I use a stored procedure, SP_RevenueIn2Revenue, when refreshing data in the Revenue table. Another stored procedure, SP_RevenueNewPartitions, is a utility to add new partitions. This affects the partition function, partition scheme and all three tables.

In my application, I refresh the Revenue table with an hour’s worth of data every five or 10 minutes, so I need hourly partitions near the current time. I prefer daily partitions further back in the past and monthly partitions even further back. I seldom update those, so I need to merge hourly partitions into larger time windows. To do that, I have another partition function, partition scheme and table (called RevenueMerge). There’s a stored procedure, SP_RevenueMerge, that uses the RevenueMerge and RevenueOut tables to accomplish merges.

Partitioning

My Revenue table is partitioned by time, so my partition function is simply a list of boundary values (datetimes) where one partition ends and another begins. Partition functions don’t really look like functions. Instead of listing hundreds of boundaries, I create a simple function with one boundary and then use a stored procedure to add new boundaries as I need them.

This creates the partition function:

CREATE PARTITION FUNCTION RevenuePartitionFunction(SMALLDATETIME)
 AS RANGE RIGHT FOR VALUES ('2012-01-01');

The date value I’ve chosen here is far enough in the past that there will never be preceding data. Partitioned tables are easiest to manage when there’s an empty partition at the left and right edges. I chose the date so the left partition will always be empty. I’ll divide the right partition later, before adding data. Range Right is usually more convenient for date ranges than Range Left. Steffen Krause discusses date ranges in his blog post at bit.ly/YWjs4C.

The Partition Scheme comes next. The scheme maps the partitions to File Groups. SQL Server table partitioning lets you place different partitions in different file groups. In this example, I might place old data on a lower performing type of storage because it isn’t accessed as often as the more recent data.

However, the techniques described here assume everything is placed in one file group (the PRIMARY file group in my example). If multiple file groups are important to your application, you’ll need to make adjustments to the code presented here.

This begins our partition scheme:

CREATE PARTITION SCHEME RevenuePartitionScheme
AS PARTITION RevenuePartitionFunction
ALL TO ([PRIMARY]);

More Partitions

Because we have hundreds of partitions (with hourly partitions for at least a few days in the past and in the future), I’ll use the stored procedure in Figure 1 to create new partitions. I execute this once to create monthly partitions, again to create daily partitions and a third time to create hourly partitions, like this:

EXEC SP_RevenueNewPartitions '2013-JAN-01', '2014-JUN-01', 'MONTH';

EXEC SP_RevenueNewPartitions '2014-JUN-01', '2014-JUL-10', 'DAY';

EXEC SP_RevenueNewPartitions '2014-JUL-10', '2014-AUG-31', 'HOUR';

I use the same procedure to create new hourly partitions before I need them. Not all applications need the same size partitions, so use the procedure to build partitions to fit your application. I have another large table, used to compare Mondays to Mondays and Tuesdays to Tuesdays, where I use weekly partitions. This procedure includes code to create these weekly partitions as well as monthly, daily and hourly partitions.

So I’ll use the procedure in Figure 1 to create new partitions before and/or after creating tables. The only rule is partitions being split must be empty (in all three tables). I handle this by building new partitions for future time periods well before the clock gets there.

Figure 1 Add New Partitions

CREATE PROCEDURE SP_RevenueNewPartitions @startTime SMALLDATETIME,
  @endTime SMALLDATETIME, @size VARCHAR(10) AS
BEGIN
  DECLARE @msg VARCHAR(500)
  IF @size != 'MONTH' AND @size != 'WEEK' AND @size != 'DAY' AND @size != 'HOUR'
  BEGIN
    RAISERROR('@size Must be MONTH or WEEK or DAY or HOUR', 11, 1);
  END
  ELSE
  BEGIN
    DECLARE @time SMALLDATETIME = @startTime;
    IF @size = 'WEEK' 
    BEGIN
      -- Force our dates to be the beginning of a SQL Server week.
      IF DATEPART(WEEKDAY, @time) != 1
      BEGIN
        SET @time = DATEADD(DAY, 8 - DATEPART(WEEKDAY, @time), @time);
      END
    END
    WHILE @time < @endTime
    BEGIN
      IF @size = 'WEEK'
      AND DATEPART(DAYOFYEAR, @time) < 8
      AND DATEPART(DAYOFYEAR, @time) != 1
      BEGIN
        -- We need to start a new partition on Jan. 1
        -- for week 1, which is a short week.
        DECLARE @Jan1 SMALLDATETIME =
          DATEADD(DAY, 1 - DATEPART(DAYOFYEAR, @time), @time);
        IF NOT EXISTS (
          SELECT *
          FROM sys.partition_range_values prv
          JOIN sys.partition_functions pf
            ON pf.function_id = prv.function_id
            AND pf.name = 'RevenuePartitionFunction'
          WHERE prv.value = @Jan1)
        BEGIN
          SET @msg = 'Adding partition boundary '
            + CONVERT(VARCHAR(16), @Jan1, 120)
          RAISERROR(@msg, 0, 1)  WITH NOWAIT;
          ALTER PARTITION SCHEME RevenuePartitionScheme NEXT USED [PRIMARY];
          ALTER PARTITION FUNCTION RevenuePartitionFunction()
            SPLIT RANGE(@Jan1);
        END
      END
      IF NOT EXISTS (
        SELECT *
        FROM sys.partition_range_values prv
        JOIN sys.partition_functions pf
          ON pf.function_id = prv.function_id
          AND pf.name = 'RevenuePartitionFunction'
        WHERE prv.value = @time)
      BEGIN
        SET @msg = 'Adding partition boundary '
          + CONVERT(VARCHAR(16), @time, 120)
        RAISERROR(@msg, 0, 1)  WITH NOWAIT;
        ALTER PARTITION SCHEME RevenuePartitionScheme NEXT USED [PRIMARY];
        ALTER PARTITION FUNCTION RevenuePartitionFunction() SPLIT RANGE(@time);
      END
      IF (@size = 'MONTH')
        SET @time = DATEADD(MONTH, 1, @time);
      ELSE IF (@size = 'WEEK')
        SET @time = DATEADD(WEEK, 1, @time);
      ELSE IF (@size = 'DAY')
        SET @time = DATEADD(DAY, 1, @time);
      ELSE IF (@size = 'HOUR')
        SET @time = DATEADD(HOUR, 1, @time);
    END;
  END;
END;

Turn the Tables

Create the Revenue Table after the Partition Function and Partition Scheme are in place. The table can be created before or after SP_RevenueNewPartitions is used to create more partitions. This will be the main large table containing active data used by queries:

CREATE TABLE Revenue(startTime SMALLDATETIME NOT NULL, adId BIGINT, ...)
 ON RevenuePartitionScheme(startTime);

You’ll have your own column definitions. Because queries only retrieve data for the columns they reference, there’s little penalty for denormalizing data and including extra columns in your table definition to avoid JOINs in queries. My Revenue table has 25 columns. With columnstore indexing, lots of columns have little impact on query performance.

Note the clause, ON RevenuePartitionScheme(startTime). This attaches the table to the partition scheme and places the data on the file group specified in the partition scheme.

Supporting Tables

So far, the Revenue table is an ordinary partitioned table. I use two supporting tables in the background to keep the main Revenue table always available. I use the RevenueIn table to pre-stage data before placing it in the Revenue table. The RevenueOut table is a place to quickly dump obsolete data from the main Revenue table. The CREATE statements for the three tables—Revenue, RevenueIn and RevenueOut—must match exactly (except for the table name):

CREATE TABLE RevenueIn(startTime SMALLDATETIME NOT NULL, adId BIGINT, ...)
 ON RevenuePartitionScheme(startTime);

CREATE TABLE RevenueOut(startTime SMALLDATETIME NOT NULL, adId BIGINT, ...)
 ON RevenuePartitionScheme(startTime);

Don’t worry about storage space. The supporting tables are almost always empty. When they do have data, it’s only a small amount (one partition) and it’s temporary. These tables aren’t used for queries. The querying application only uses the main Revenue table.

Tables that match in schema, partition boundaries and file group work with partition switching in SQL Server. Think of our partitioned table as an organized collection of smaller tables, each stored separately on disk. With the ALTER TABLE ... SWITCH PARTITION ... statement, SQL Server simply reorganizes the relationships of these smaller tables to the outwardly visible tables without physically copying any data or rebuilding indexes that match. In my application, the switch takes less than 5 ms.

Columnstore Indexes

For maximum performance, I want to columnstore index all the columns in the Revenue table. To do so, edit this statement to include all the columns from the CREATE TABLE statement:

CREATE NONCLUSTERED COLUMNSTORE INDEX Revenue_NDXColumnStore
ON Revenue(startTime, adId, ...);

Under the covers, SQL Server creates a separate storage blob for each column with all column data. Because the table is partitioned, this is done partition-by-partition. I’ll end up with many small pieces of storage, one for each partition/column combination. If I have 25 columns in my table and 200 partitions, that short statement creates 5,000 storage areas. SQL Server uses various compression techniques to minimize disk space and the time required to read from disk for queries.

I’ll do the same for the RevenueIn table (but not for the RevenueOut table):

CREATE NONCLUSTERED COLUMNSTORE INDEX RevenueIn_NDXColumnStore
ON RevenueIn(startTime, adId, ...);

It’s important these indexes are exactly the same. When performing a SWITCH statement to move partitions, if indexes on the incoming and destination partitions match exactly, the indexes are switched along with the data. Otherwise, they’re rebuilt in the destination, disabling the table for a significant amount of time.

Update Data

My application rebuilds a full hour’s data from another source, much like you would rebuild part of an OLAP cube. The application then executes these SQL Server commands:

ALTER INDEX RevenueIn_NDXColumnStore ON RevenueIn DISABLE;
DELETE FROM RevenueIn;
INSERT INTO RevenueIn .....;
EXEC SP_RevenueIn2Revenue @startTime;

Normally, the RevenueIn table’s index is disabled and the table is empty. Just to be safe (in case some other process didn’t finish), I disable the index and delete anything in the table. Disabling the index allows the DELETE and INSERT statements to function. Unfortunately, a TRUNCATE statement is never allowed on a table with a columnstore index, even if that index is disabled. This takes little time in normal circumstances with an empty table.

Then I insert new data into the RevenueIn table. I can use any technique to insert the data. C# SqlBulkCopy is useful for moving a lot of data. I’ll recompute and insert into RevenueIn exactly one partition’s data. In my case, the data source is a Hadoop cluster that constantly receives new log records (at rates of 20,000 per second). It has all the data I need, but it’s too slow for user queries. That's why I have my Revenue table. I gather an hour’s data from Hadoop, summarize it and insert it into RevenueIn.

If you need to do an UPDATE instead of rebuilding all the data for an INSERT, you could copy one partition’s data from Revenue to RevenueIn and UPDATE RevenueIn. When I have one partition’s data safely in the RevenueIn table, a stored procedure does the rest of the work (see Figure 2).

Figure 2 Move Data into Revenue Table

CREATE PROCEDURE SP_RevenueIn2Revenue @startTime SMALLDATETIME AS
 BEGIN
  TRUNCATE TABLE RevenueOut;
  ALTER INDEX RevenueIn_NDXColumnStore ON RevenueIn REBUILD;
  DECLARE @partitionId INT
  SET @partitionId = $Partition.RevenuePartitionFunction(@startTime);
  BEGIN TRANSACTION
    ALTER TABLE Revenue SWITCH PARTITION @partitionId
      TO RevenueOut PARTITION @partitionId;
    ALTER TABLE RevenueIn SWITCH PARTITION @partitionId
      TO Revenue PARTITION @partitionId;
  COMMIT;
  UPDATE STATISTICS Revenue WITH RESAMPLE;
  TRUNCATE TABLE RevenueOut;
  ALTER INDEX RevenueIn_NDXColumnStore ON RevenueIn DISABLE;
 END;

The slow operations of DELETE and INSERT (and REBUILD inside SP_RevenueIn2Revenue) are all accomplished without touching the Revenue table, which remains fully available for queries. The TRANSACTION with the two ALTER TABLE ... SWITCH PARTITION statements takes 5 ms in my environment.

Remember that I didn’t build an index on the RevenueOut table. When I switch a partition from Revenue to RevenueOut, the index on that partition on the Revenue table is simply dropped because there’s no matching index on the destination table. Also, because there’s no index on RevenueOut, a TRUNCATE statement works to discard the old data.

I’ve experienced cases when SQL Server made poor choices in its optimizer for queries referencing the new partition. The UPDATE STATISTICS statement ensures the optimizer has the best information available, including the newly inserted data. My application repeats this process every five to 10 minutes, so user reports (from the Revenue table) are never far behind real time. Figure 3 reminds you how this process works.

Refresh Partition Data
Figure 3 Refresh Partition Data

Merge Partitions

Not all applications will benefit from merging partitions, but I believe my Revenue table will. Columnstore indexes achieve some of their performance using compression and fast access techniques like dictionaries and run length encoding. They work better with larger datasets. Because each partition gets its own storage, larger partitions lead to larger pieces of storage and potentially better performance.

Also, a query that selects and aggregates a month of data may require an extra step of selecting data from many small partitions before aggregation. This is especially true if I keep daily or hourly partitions of past data. I extended the data update technique to support merging partitions with minimal downtime (a few seconds). To support merging, I create a new partition function, a new partition scheme and a new table with a columnstore index:

CREATE PARTITION FUNCTION RevenueMergePartitionFunction(SMALLDATETIME)
AS RANGE RIGHT FOR VALUES ('2012-01-01');
CREATE PARTITION SCHEME RevenueMergePartitionScheme
AS PARTITION RevenuePartitionFunction
ALL TO ([PRIMARY]);
CREATE TABLE RevenueMerge(startTime SMALLDATETIME NOT NULL, adId BIGINT, ...)
 ON RevenueMergePartitionScheme(startTime);
CREATE NONCLUSTERED COLUMNSTORE INDEX RevenueMerge_NDXColumnStore
ON RevenueMerge(startTime, adId, ...);

Again, the table schema and index must precisely match the Revenue table. These database objects are all used by a rather long stored procedure. The procedure contains some error-checking and RAISERROR statements to indicate its progress.

The procedure is called like this:

EXEC SP_RevenueMerge '2014-APR-01', '2014-MAY-01';

This will merge all the partitions between the two dates into one partition. The size of the current partitions doesn’t matter, but the dates must be boundaries of current partitions. The procedure won’t break up partitions to merge parts of them.

The merge process works like this: First, the procedure modifies the RevenueMerge partition function and scheme to match the desired new large partition. Then it copies the new partition’s data from Revenue to RevenueMerge and creates the new columnstore index for this new partition. Copying and creating the index will take the most time. With the data prestaged, the procedure now has to lock the Revenue table. The old partitions are moved from Revenue to RevenueOut and RevenueOut is truncated. All the “old” partitions in Revenue, RevenueIn and RevenueOut are now empty.

The old partition boundaries are removed. This proceeds quickly because there’s no data to move or reorganize or index. It’s not as quick as the SWITCH statements during data import, but it should be over in a few seconds before application users notice. With the main partition boundaries now matching the RevenueMerge boundaries, a SWITCH statement brings in the copied data as a new (larger) partition with its index.

Figure 4 Merge Procedure

CREATE PROCEDURE SP_RevenueMerge
  @partitionStart SMALLDATETIME, @partitionEnd SMALLDATETIME AS
BEGIN
  DECLARE @msg VARCHAR(MAX);
  SET @msg = CONVERT(VARCHAR(20), GETDATE(), 14) + ' BEGIN SP_RevenueMerge' 
  RAISERROR (@msg, 0, 1) WITH NOWAIT;
  -- Verify the start and end dates are on partition boundaries.
  IF @partitionStart >= @partitionEnd
  OR NOT EXISTS (
    SELECT *
    FROM sys.partition_range_values prv
    JOIN sys.partition_functions pf
      ON pf.function_id = prv.function_id
      AND pf.name = 'RevenuePartitionFunction'
    WHERE prv.value = @partitionStart
    )
  OR NOT EXISTS (
    SELECT *
    FROM sys.partition_range_values prv
    JOIN sys.partition_functions pf
      ON pf.function_id = prv.function_id
      AND pf.name = 'RevenuePartitionFunction'
    WHERE prv.value = @partitionEnd
    )
  BEGIN
    SET @msg =
      'Bad Parameters. @partitionStart and @partitionEnd must be existing partition boundaries.';
    RAISERROR (@msg, 11, 1) WITH NOWAIT;
  END
  ELSE
  BEGIN
    -- Empty the Merge table and disable its index.
    ALTER INDEX RevenueMerge_NDXColumnStore ON RevenueMerge DISABLE;
    -- Table should be empty so DELETE is fast, but it's OK if it takes time.
    -- Hide the DELETE DML from the parser because parser
    -- doesn't know index is disabled and will complain.
    EXEC('DELETE FROM RevenueMerge');
    -- RevenueMerge is empty, unindexed and ready for us.
    -- Add new boundaries to Merge function, if needed.
    IF NOT EXISTS (
      SELECT *
      FROM sys.partition_range_values prv
      JOIN sys.partition_functions pf
        ON pf.function_id = prv.function_id
        AND pf.name = 'RevenueMergePartitionFunction'
      WHERE prv.value = @partitionStart)
    BEGIN
      ALTER PARTITION SCHEME RevenueMergePartitionScheme NEXT USED [PRIMARY];
      ALTER PARTITION FUNCTION RevenueMergePartitionFunction()
        SPLIT RANGE(@partitionStart);
    END;
    IF NOT EXISTS (
      SELECT *
      FROM sys.partition_range_values prv
      JOIN sys.partition_functions pf
        ON pf.function_id = prv.function_id
        AND pf.name = 'RevenueMergePartitionFunction'
      WHERE prv.value = @partitionEnd)
    BEGIN
      ALTER PARTITION SCHEME RevenueMergePartitionScheme NEXT USED [PRIMARY];
      ALTER PARTITION FUNCTION RevenueMergePartitionFunction()
        SPLIT RANGE(@partitionEnd);
    END; 
    -- Cursor (static) to list old partition boundaries on RevenueMerge table.
    DECLARE CursorOldMergeBoundaries CURSOR STATIC FOR
    SELECT CAST(rv.value AS SMALLDATETIME)
    FROM sys.partition_functions pf
    JOIN sys.partition_range_values rv
      ON rv.function_id = pf.function_id
    WHERE pf.name = 'RevenueMergePartitionFunction'
    ORDER BY value;
    DECLARE @oldMergeBoundary SMALLDATETIME;
    -- Remove old Merge boundaries that don't match new boundaries.
    OPEN CursorOldMergeBoundaries;
    FETCH NEXT FROM CursorOldMergeBoundaries INTO @oldMergeBoundary;
    WHILE @@FETCH_STATUS = 0
    BEGIN
      IF @oldMergeBoundary != @partitionStart
      AND @oldMergeBoundary != @partitionEnd
      BEGIN
        ALTER PARTITION FUNCTION RevenueMergePartitionFunction()
          MERGE RANGE (@oldMergeBoundary);
      END;     
      FETCH NEXT FROM CursorOldMergeBoundaries INTO @oldMergeBoundary;
    END;
    CLOSE CursorOldMergeBoundaries;
    DEALLOCATE CursorOldMergeBoundaries;
    -- Merge table now has just the new partition we need and is empty.
    -- Now copy the data from Revenue into the new partition in RevenueMerge.
    DECLARE @sql AS VARCHAR(MAX);
    SET @sql =
      'INSERT INTO RevenueMerge SELECT * FROM Revenue WHERE startTime >= '''
      + CAST(@partitionStart AS VARCHAR) + ''' AND startTime < '''
      + CAST(@partitionEnd AS VARCHAR) + '''';
    EXEC(@sql);
    -- Build the columnstore index.
    ALTER INDEX RevenueMerge_NDXColumnStore ON RevenueMerge REBUILD;
    -- RevenueMerge now has the new partition with a column store index.
    -- We will need to merge partitions (by removing boundaries)
    -- in the Revenue table.
    DECLARE CursorRemoveValues CURSOR STATIC FOR
    SELECT CAST(rv.value AS SMALLDATETIME)
    FROM sys.partition_functions pf
    JOIN sys.partition_range_values rv
      ON rv.function_id = pf.function_id
    WHERE pf.name = 'RevenuePartitionFunction'
    AND value >= @partitionStart
    AND value <= @partitionEnd
    ORDER BY value;
    DECLARE @oldBoundary SMALLDATETIME;
    DECLARE @partitionId INT;
    OPEN CursorRemoveValues;
    FETCH NEXT FROM CursorRemoveValues INTO @oldBoundary;
    SET @msg = CONVERT(VARCHAR(20), GETDATE(), 14)
      + ' BEGIN TRANSACTION. Locking Revenue table.' 
    RAISERROR (@msg, 0, 1) WITH NOWAIT;
    BEGIN TRANSACTION
      -- Move all the old data from Revenue to RevenueOut.
      -- Revenue and RevenueOut (and RevenueIn)
      -- all have the same partition layout.
      WHILE @@FETCH_STATUS = 0 AND @oldBoundary < @partitionEnd
      BEGIN
        -- Move one old partition's data from Revenue to RevenueOut.
        -- Find the current PartitionId for the date to be moved.
        SET @partitionId = $Partition.RevenuePartitionFunction(@oldBoundary);
        SET @msg = CONVERT(VARCHAR(20), GETDATE(), 14)
          + ' MOVING ' + CONVERT(VARCHAR(16), @oldBoundary, 20)
          + ' Partition ' + CONVERT(VARCHAR(3), @partitionId);  
        RAISERROR (@msg, 0, 1) WITH NOWAIT;
        -- Move the data from Revenue to RevenueOut.
        ALTER TABLE Revenue SWITCH PARTITION @partitionId
          TO RevenueOut PARTITION @partitionId;
        FETCH NEXT FROM CursorRemoveValues INTO @oldBoundary;
      END;
      -- Discard the old data in RevenueOut
      SET @msg = CONVERT(VARCHAR(20), GETDATE(), 14) + ' TRUNCATING RevenueOut';  
      RAISERROR (@msg, 0, 1) WITH NOWAIT;
      TRUNCATE TABLE RevenueOut;
      -- Start again and remove the old boundaries.
      -- We leave the boundaries at @partitionStart and @partitionEnd in place.
      SET @msg = CONVERT(VARCHAR(20), GETDATE(), 14)
        + ' Removing partition boundaries';  
      RAISERROR (@msg, 0, 1) WITH NOWAIT;
      FETCH FIRST FROM CursorRemoveValues INTO @oldBoundary; 
      WHILE @@FETCH_STATUS = 0
      BEGIN
        IF @oldBoundary <> @partitionStart AND @oldBoundary <> @partitionEnd
        BEGIN
          -- Remove the partition boundary. It's empty, so this should be fast.
          SET @msg = CONVERT(VARCHAR(20), GETDATE(), 14)
            + ' REMOVING Boundary ' + CONVERT(VARCHAR(16), @oldBoundary, 20);  
          RAISERROR (@msg, 0, 1) WITH NOWAIT;
          ALTER PARTITION FUNCTION RevenuePartitionFunction()
            MERGE RANGE (@oldBoundary);
        END
        FETCH NEXT FROM CursorRemoveValues INTO @oldBoundary;
      END;
      -- Move the new (larger) partition's data from
      -- RevenueMerge partition #2 to Revenue.
      SET @partitionId = $Partition.RevenuePartitionFunction(@partitionStart);
      SET @msg = CONVERT(VARCHAR(20), GETDATE(), 14)
        + ' Switching into partition ' + CONVERT(VARCHAR(3), @partitionId); 
      RAISERROR (@msg, 0, 1) WITH NOWAIT;
      ALTER TABLE RevenueMerge SWITCH PARTITION 2
        TO Revenue PARTITION @partitionId;
    COMMIT;
    SET @msg = CONVERT(VARCHAR(20), GETDATE(), 14)
      + ' END TRANSACTION. Revenue table available.' 
    RAISERROR (@msg, 0, 1) WITH NOWAIT;
    CLOSE CursorRemoveValues;
    DEALLOCATE CursorRemoveValues;
    UPDATE STATISTICS Revenue WITH RESAMPLE;
 
    SET @msg = CONVERT(VARCHAR(20), GETDATE(), 14) + ' END SP_RevenueMerge' 
    RAISERROR (@msg, 0, 1) WITH NOWAIT;
  END;
END;
GO

You may notice some EXEC statements in the procedure. I don’t entirely approve of this, but SQL Server will refuse to run the procedure with DELETE or INSERT statements on a table with an enabled columnstore index, even if that index is disabled during the procedure. The EXEC hides this so the procedure will run. You can use this procedure to merge hours into days, days into weeks, hours into months or whatever fits your application.

List Partitions

I want to leave you with a handy table function to list partitions. Keeping track of hundreds of partitions can be tedious, especially during development work. A list of partitions and the rows of data within them can help. The function in Figure 5 does that nicely and you can use it in a SELECT statement with a WHERE clause to see only a few partitions.

Figure 5  Function to List Partitions With Row Counts

CREATE FUNCTION ListPartitions(@tableName VARCHAR(200)) 
RETURNS TABLE
AS
  -- LIST PARTITIONS FOR A TABLE
  RETURN SELECT TOP 15000 @tableName AS [Table],
    p.partition_number AS [p#],
    p.rows,
    CASE boundary_value_on_right
      WHEN 1 THEN ' < ' ELSE ' <= ' END as comparison,
      rv.value
  FROM sys.partitions p WITH (NOLOCK)
  INNER JOIN sys.indexes i
    ON i.object_id = OBJECT_ID(@tableName)
    AND i.index_id = p.index_id
    AND i.index_id < 2
  INNER JOIN sys.partition_schemes ps
    ON ps.data_space_id = i.data_space_id
  INNER JOIN sys.partition_functions f
    ON f.function_id = ps.function_id
  LEFT OUTER JOIN sys.partition_range_values rv
    ON f.function_id = rv.function_id
    AND p.partition_number = rv.boundary_id
  WHERE p.object_id = OBJECT_ID(@tableName)
  ORDER BY p.partition_number;
GO

This produces output like Figure 6 with partition numbers, row counts and boundary values.

Figure 6 Sample Partition List

Table p# rows comparison value
Revenue 16 12579561  < 2/1/14 0:00
Revenue 17 14173521  < 3/1/14 0:00
Revenue 18 16528775  < 4/1/14 0:00
Revenue 19 24057  < 4/1/14 1:00
Revenue 20 24156  < 4/1/14 2:00
Revenue 21 23625  < 4/1/14 3:00
Revenue 22 23074  < 4/1/14 4:00
Revenue 23 21483  < 4/1/14 5:00

 

Wrapping Up

The solution described here has many moving parts, but I’ve taken great effort to minimize them and make them look and behave similarly. The most significant benefit is on the query side. There’s exactly one table visible to queries. All the partitioning and indexing and data movement is hidden from the queries and the applications they support.


Don Mackenzie is director of software architecture at Cox Digital Solutions, the Internet advertising arm of Cox Media Group and Cox Enterprises. He enjoys applying new technologies to business applications. Reach him at don@coxds.com.

Thanks to the following Microsoft technical expert for reviewing this article: Remus Rusanu