Filtered Statistics

Article
01/08/2009

Filtered statistics are optimized statistics, especially suited to improve the accuracy of statistics for queries that select from well-defined subsets of data. Filtered statistics use a filter predicate to select the subset of data that is included in the statistics.

Filtered statistics can improve query plan quality for the following types of queries:

Queries that use a filtered index.
The SQL Server Database Engine automatically creates and maintains filtered statistics for filtered indexes. These filtered statistics are more accurate than full-table statistics because they cover only the rows in the filtered index.
Queries on subsets of data in nonindexed columns.
If a column is not defined in any index and it is referenced in query predicates, the Database Engine might create full-table statistics on the column to improve the query plans. If the nonindexed column contains subsets of data that have unique data distributions, filtered statistics on a subset of rows can improve the accuracy of the statistics that the query optimizer uses for the query plan.
Queries that select from correlated columns.
Filtered statistics can improve the query plan when queries select from a nonindexed column containing subsets of data that correlate to values in another column. If the number of data lookups in the nonindexed column is small, a filtered index might not provide significant performance improvements for data lookups. However, filtered statistics on subsets of data in the nonindexed column can improve the query plan quality.

Design Considerations

In order to design effective filtered statistics, it is important to understand what queries your application uses and how they relate to subsets of data. The following design considerations give examples of when to create filtered statistics.

Filtered Indexes

Because the Database Engine automatically creates and updates filtered statistics on the rows in filtered indexes, it is important to understand how to design filtered indexes. Filtered indexes and their filtered statistics are both defined with the same filter predicate expression that is based on simple comparison operators. For more information on designing and creating filtered indexes, see Filtered Index Design Guidelines and CREATE INDEX (Transact-SQL).

Nonindexed Columns

Filtered statistics can improve the accuracy of statistics on a subset of rows that does not have a filtered index. You can create filtered statistics by using the CREATE STATISTICS command with the WHERE clause to define the filter predicate expression. For more information, see CREATE STATISTICS (Transact-SQL).

As mentioned in the introduction, the Database Engine might create automatically generate full-table statistics on a nonindexed column to improve a query plan. The query optimizer does not automatically generate filtered statistics. If the column contains subsets of data with different data distributions, you can create filtered statistics for the query optimizer to use instead of autogenerated full table statistics.

You can use the following query to determine if the query optimizer has autogenerated statistics for a column. It queries the dynamic management views sys.stats and sys.stats_columns to return the database object name, the column name, and the statistics name for all of the columns that have autogenerated statistics. When the query optimizer autogenerates statistics on nonindexed columns, the statistics name starts with _WA.

USE AdventureWorks;
GO
SELECT OBJECT_NAME(s.object_id) AS object_name,
    COL_NAME(sc.object_id, sc.column_id) AS column_name,
    s.name AS statistics_name
FROM sys.stats AS s Join sys.stats_columns AS sc
    ON s.stats_id = sc.stats_id
WHERE s.name like '_WA%'
ORDER BY s.name;
GO

Correlated Columns

Filtered statistics can improve the query plan when a query selects from a nonindexed column that has a subset of data that correlates to values in another column. For example, using AdventureWorks, each product in the Production.Product table belongs to one of four categories in the Production.ProductCategory table: Bikes, Components, Clothing, and Accessories. Each of the categories has a different data distribution for weight: bike weights range from 13.77 to 30.0, component weights range from 2.12 to 1050.00 with some NULL values, clothing weights are all NULL, and accessories weights are also NULL.

USE AdventureWorks;
GO
IF EXISTS ( SELECT name FROM sys.stats
    WHERE name = 'BikeWeights'
    AND object_ID = OBJECT_ID ('Production.Product'))
DROP STATISTICS Production.Product.BikeWeights;
GO
CREATE STATISTICS BikeWeights
    ON Production.Product (Weight)
WHERE ProductSubcategoryID IN (1,2,3);
GO

The query optimizer can use the BikeWeights filtered statistics to improve the query plan for the following query that selects all of the bikes that weigh more than 25.

SELECT P.Weight AS Weight, S.Name AS BikeName
FROM Production.Product AS P
    JOIN Production.ProductSubcategory AS S 
    ON P.ProductSubcategoryID = S.ProductSubcategoryID
WHERE P.ProductSubcategoryID IN (1,2,3) AND P.Weight > 25
ORDER BY P.Weight;
GO

Referencing Dependencies

The sys.sql_expression_dependencies catalog view tracks each column in the filtered statistics predicate as a referencing dependency. Consider the operations that you perform on table columns before creating filtered statistics because you cannot drop, rename, or alter the definition of a table column that is defined in a filtered statistics predicate.

Sampling Rates

Filtered indexes use the default sampling rate for creating filtered statistics. When you create filtered statistics with the CREATE STATISTICS statement, you can choose the sampling rate. When only a few values are returned, however, sampling might not provide accurate statistics. For highly selective predicates, which return relatively few values, using the WITH FULLSCAN option will improve accuracy because it will scan all of the values instead of sampling. For usage information on the WITH FULLSCAN option, see CREATE STATISTICS (Transact-SQL).

Automatic Statistical Updates

Filtered statistics are updated automatically unless you disable automatic statistics recomputation and then update them manually. Under automatic statistical updating, the default sampling rate for filtered statistics varies and is based on the selectivity of the filtered statistics predicate. For more information about updating statistics, see UPDATE STATISTICS (Transact-SQL).

Filtered Statistics

Design Considerations

Filtered Indexes

Nonindexed Columns

Correlated Columns

Referencing Dependencies

Sampling Rates

Automatic Statistical Updates

See Also

Concepts

Help and Information

Additional resources