Columnstore Indexes Described
The SQL Server in-memory columnstore index stores and manages data by using column-based data storage and column-based query processing. Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. Use the columnstore index to achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size.
Note
|
|---|
|
We view the clustered columnstore index as the standard for storing large data warehousing fact tables, and expect it will be used in most data warehousing scenarios. Since the clustered columnstore index is updateable, your workload can perform a large number of insert, update, and delete operations. |
A columnstore index is a technology for storing, retrieving and managing data by using a columnar data format, called a columnstore. SQL Server supports both clustered and nonclustered columnstore indexes. Both use the same in-memory columnstore technology, but they do have differences in purpose and in features they support.
Benefits
Columnstore indexes work well for mostly read-only queries that perform analysis on large data sets. Often, these are queries for data warehousing workloads. Columnstore indexes give high performance gains for queries that use full table scans, and are not well-suited for queries that seek into the data, searching for a particular value.
Columnstore Index benefits:
-
Columns often have similar data, which results in high compression rates.
-
High compression rates improve query performance by using a smaller in-memory footprint. In turn, query performance can improve because SQL Server can perform more query and data operations in-memory.
-
A new query execution mechanism called batch-mode execution has been added to SQL Server that reduces CPU usage by a large amount. Batch-mode execution is closely integrated with, and optimized around, the columnstore storage format. Batch-mode execution is sometimes known as vector-based or vectorized execution.
-
Queries often select only a few columns from a table, which reduces total I/O from the physical media.
Columnstore Versions
SQL Server 2012, SQL Server 2012 Parallel Data Warehouse, and SQL Server 2014 all use columnstore indexes to accelerate common data warehouse queries. SQL Server 2012 introduced two new features: a nonclustered columnstore index and a vector-based query execution capability that processes data in units called "batches." SQL Server 2014 has the features of SQL Server 2012 plus updateable clustered columnstore indexes.
Key Characteristics
|
Applies to: SQL Server 2014 through SQL Server 2014. |
In SQL Server, a clustered columnstore index:
-
Is available in Enterprise, Developer, and Evaluation editions.
-
Is updateable.
-
Is the primary storage method for the entire table.
-
Has no key columns. All columns are included columns.
-
Is the only index on the table. It cannot be combined with any other indexes.
-
Can be configured to use columnstore or columnstore archival compression.
-
Does not physically store columns in a sorted order. Instead, it stores data to improve compression and performance.
|
Applies to: SQL Server 2012 through SQL Server 2014. |
In SQL Server, a nonclustered columnstore index:
-
Can index a subset of columns in the clustered index or heap. For example, it can index the frequently used columns.
-
Requires extra storage to store a copy of the columns in the index.
-
Is updated by rebuilding the index or switching partitions in and out. It is not updateable by using the DML operations such as insert, update, and delete.
-
Can be combined with other indexes on the table.
-
Can be configured to use columnstore or columnstore archival compression.
-
Does not physically store columns in a sorted order. Instead, it stores data to improve compression and performance. Pre-sorting the data before creating the columnstore index is not required, but can improve columnstore compression.
Key Concepts and Terms
The following key terms and concepts are associated with columnstore indexes.
Loading Data into a Nonclustered Columnstore Index
To load data into a nonclustered columnstore index, first load data into a traditional rowstore table stored as a heap or clustered index, and then create the nonclustered columnstore index with CREATE COLUMNSTORE INDEX (Transact-SQL).
A table with a nonclustered columnstore index is read-only until the index is dropped or disabled. To update the table and the nonclustered columnstore index you can switch partitions in and out. You can also disable the index, update the table, and then rebuild the index.
For more information see Using Nonclustered Columnstore Indexes
Loading Data into a Clustered Columnstore Index
As the diagram suggests, to load data into a clustered columnstore index, SQL Server:
-
Inserts maximum-size rowgroups directly into the columnstore. As the data is loaded, SQL Server assigns the data rows in a first-come first-serve order into an open rowgroup.
-
For each rowgroup, after it reaches the maximum size, SQL Server:
-
Marks the rowgroup as CLOSED.
-
Bypasses the deltastore.
-
Compresses each column segment with the rowgroup with columnstore compression.
-
Physically stores each compressed column segment into the columnstore.
-
-
Inserts the remaining rows into the columnstore or the deltastore as follows:
-
If the number of rows meets the minimum rows per rowgroup requirement, the rows are added to the columnstore.
-
If the number of rows is less than the minimum rows per rowgroup, the rows are added to the deltastore.
-
For more information about deltastore tasks and processes, see Using Clustered Columnstore Indexes
Plan for enough memory to create columnstore indexes in parallel
Creating a columnstore index is by default a parallel operation unless memory is constrained. Creating the index in parallel requires more memory than creating the index serially. When there is ample memory, creating a columnstore index takes on the order of 1.5 times as long as building a B-tree on the same columns.
The memory required for creating a columnstore index depends on the number of columns, the number of string columns, the degree of parallelism (DOP), and the characteristics of the data. For example, if your table has fewer than one million rows, SQL Server will use only one thread to create the columnstore index.
If your table has more than one million rows, but SQL Server cannot get a large enough memory grant to create the index using MAXDOP, SQL Server will automatically decrease MAXDOP as needed to fit into the available memory grant. In some cases, DOP must be decreased to one in order to build the index under constrained memory.
Nonclustered Columnstore Indexes
For common tasks, see Using Nonclustered Columnstore Indexes.
Clustered Columnstore Indexes
For common tasks, see Using Clustered Columnstore Indexes.
Metadata
All of the columns in a columnstore index are stored in the metadata as included columns. The columnstore index does not have key columns.
Note