
Determining When to Create Statistics
The query optimizer already creates statistics in the following ways:
-
The query optimizer creates statistics for indexes on tables or views when the index is created. These statistics are created on the key columns of the index. If the index is a filtered index, the query optimizer creates filtered statistics on the same subset of rows specified for the filtered index. For more information about filtered indexes, see Filtered Index Design Guidelines and CREATE INDEX (Transact-SQL).
-
The query optimizer creates statistics for single columns in query predicates when AUTO_CREATE_STATISTICS is on.
For most queries, these two methods for creating statistics ensure a high-quality query plan; in a few cases, you can improve query plans by creating additional statistics with the CREATE STATISTICS statement. These additional statistics can capture statistical correlations that the query optimizer does not account for when it creates statistics for indexes or single columns. Your application might have additional statistical correlations in the table data that, if calculated into a statistics object, could enable the query optimizer to improve query plans. For example, filtered statistics on a subset of data rows or multicolumn statistics on query predicate columns might improve the query plan.
When creating statistics with the CREATE STATISTICS statement, we recommend keeping the AUTO_CREATE_STATISTICS option on so that the query optimizer continues to routinely create single-column statistics for query predicate columns. For more information about query predicates, see Search Condition (Transact-SQL).
Consider creating statistics with the CREATE STATISTICS statement when any of the following applies:
-
The Database Engine Tuning Advisor suggests creating statistics.
-
The query predicate contains multiple correlated columns that are not already in the same index.
-
The query selects from a subset of data.
-
The query has missing statistics.
Database Engine Tuning Advisor Suggests Creating Statistics
The Database Engine Tuning Advisor is a a tool that analyzes the performance effects of workloads on one or more databases. It offers recommendations for improving performance (such as suggesting indexes to create) and might suggest using CREATE STATISTICS to create query optimization statistics. You should follow this recommendation. For more information about the Database Engine Tuning Advisor, see Tuning the Physical Database Design.
Query Predicate Contains Multiple Correlated Columns
When a query predicate contains multiple columns that have cross-column relationships and dependencies, statistics on the multiple columns might improve the query plan. Statistics on multiple columns contain cross-column correlation statistics, called densities, that are not available in single-column statistics. Densities can improve cardinality estimates when query results depend on data relationships among multiple columns.
If the columns are already in the same index, the multicolumn statistics object already exists and it is not necessary to create it manually. If the columns are not already in the same index, you can create multicolumn statistics by creating an index on the columns or by using the CREATE STATISTICS statement. It requires more system resources to maintain an index than a statistics object. If the application does not require the multicolumn index, you can economize on system resources by creating the statistics object without creating the index.
When creating multicolumn statistics, the order of the columns in the statistics object definition affects the effectiveness of densities for making cardinality estimates. The statistics object stores densities for each prefix of key columns in the statistics object definition. For more information about densities, see DBCC SHOW_STATISTICS (Transact-SQL).
To create densities that are useful for cardinality estimates, the columns in the query predicate must match one of the prefixes of columns in the statistics object definition. For example, the following creates a multicolumn statistics object on the columns LastName, MiddleName, and FirstName.
USE AdventureWorks;
GO
IF EXISTS (SELECT name FROM sys.stats
WHERE name = 'LastFirst'
AND object_ID = OBJECT_ID ('Person.Contact'))
DROP STATISTICS Person.Contact.LastFirst;
GO
CREATE STATISTICS LastFirst ON Person.Contact (LastName, MiddleName, FirstName);
GO
In this example, the statistics object LastFirst has densities for the following column prefixes: (LastName), (LastName, MiddleName), and (LastName, MiddleName, FirstName). The density is not available for (LastName, FirstName). If the query uses LastName and FirstName without using MiddleName, the density is not available for cardinality estimates.
Query Selects from a Subset of Data
When the query optimizer creates statistics for single columns and indexes, it creates the statistics for the values in all rows. When queries select from a subset of rows, and that subset of rows has a unique data distribution, filtered statistics can improve query plans. You can create filtered statistics by using the CREATE STATISTICS statement with the WHERE clause to define the filter predicate expression.
For example, using AdventureWorks, each product in the Production.Product table belongs to one of four categories in the Production.ProductCategory table: Bikes, Components, Clothing, and Accessories. Each of the categories has a different data distribution for weight: bike weights range from 13.77 to 30.0, component weights range from 2.12 to 1050.00 with some NULL values, clothing weights are all NULL, and accessory weights are also NULL.
Using Bikes as an example, filtered statistics on all bike weights will provide more accurate statistics to the query optimizer and can improve the query plan quality compared with full-table statistics or nonexistent statistics on the Weight column. The bike weight column is a good candidate for filtered statistics but not necessarily a good candidate for a filtered index if the number of weight lookups is relatively small. The performance gain for lookups that a filtered index provides might not outweigh the additional maintenance and storage cost for adding a filtered index to the database.
The following statement creates the BikeWeights filtered statistics on all of the subcategories for Bikes. The filtered predicate expression defines bikes by enumerating all of the bike subcategories with the comparison Production.ProductSubcategoryID IN (1,2,3). The predicate cannot use the Bikes category name because it is stored in the Production.ProductCategory table, and all columns in the filter expression must be in the same table.
USE AdventureWorks;
GO
IF EXISTS ( SELECT name FROM sys.stats
WHERE name = 'BikeWeights'
AND object_ID = OBJECT_ID ('Production.Product'))
DROP STATISTICS Production.Product.BikeWeights;
GO
CREATE STATISTICS BikeWeights
ON Production.Product (Weight)
WHERE ProductSubcategoryID IN (1,2,3);
GO
The query optimizer can use the BikeWeights filtered statistics to improve the query plan for the following query that selects all of the bikes that weigh more than 25.
SELECT P.Weight AS Weight, S.Name AS BikeName
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS S
ON P.ProductSubcategoryID = S.ProductSubcategoryID
WHERE P.ProductSubcategoryID IN (1,2,3) AND P.Weight > 25
ORDER BY P.Weight;
GO
Query Has Missing Statistics
If an error or other event prevents the query optimizer from creating statistics, the query optimizer creates the query plan without using statistics. The query optimizer marks the statistics as missing and attempts to regenerate the statistics the next time the query is executed.
Missing statistics are indicated as warnings (table name in red text) when the execution plan of a query is graphically displayed using SQL Server Management Studio. For more information, see Displaying Graphical Execution Plans (SQL Server Management Studio). Additionally, monitoring the Missing Column Statistics event class by using SQL Server Profiler indicates when statistics are missing. For more information, see Errors and Warnings Event Category (Database Engine).
If statistics are missing, perform the following steps:
-
Verify that AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are on.
-
Verify that the database is not read-only. If the database is read-only, the query optimizer cannot save statistics.
-
Create the missing statistics by using the CREATE STATISTICS statement.