.gif)
SQL Server
Technical Article
Writer: Eric N. Hanson and Yavor Angelov
Contributor: Lubor Kollar
Published: February 2009
Applies to: SQL Server 2008
Summary: Microsoft SQL Server 2008 collects
statistical information about indexes and column data stored in the database.
These statistics are used by the SQL Server query optimizer to choose the most
efficient plan for retrieving or updating data. This paper describes what data
is collected, where it is stored, and which commands create, update, and delete
statistics. By default, SQL Server 2008 also creates and updates statistics
automatically, when such an operation is considered to be useful. This paper
also outlines how these defaults can be changed on different levels (column,
table, and database). In addition, it presents how certain query language
features, such as Transact-SQL variables, interact with use of statistics by
the optimizer, and it provides guidance for using these features when writing
queries so you can obtain good query performance.
Statistical Data in SQL
Server 2008
Microsoft SQL
Server 2008 collects statistics about individual columns (single-column
statistics) or sets of columns (multicolumn statistics). Statistics are used by
the query optimizer to estimate the selectivity of expressions, and thus the
size of intermediate and final query results. Good statistics allow the
optimizer to accurately assess the cost of different query plans and then
choose a high-quality plan. All information about a single statistics object is
stored in several columns of a single row in the sysindexes table, and in a statistics binary large object (statblob) kept in an internal-only
table. In addition, information about statistics can be found in the new
metadata views sys.stats and sys.indexes.
Summary of Statistics
Features
SQL Server
2008 has a number of features for maintaining statistics. The most important
one is the ability to automatically create and update statistics. This feature,
also called auto stats, is on by default starting in SQL Server 2000.
Approximately 98% of SQL Server 2000 installations leave this feature enabled,
which is a best practice. For the majority of databases and applications,
developers and administrators can rely on the automatic creation and update of
statistics to provide comprehensive and accurate statistics about their data.
This allows the SQL Server 2008 query optimizer to produce good query plans
consistently, while keeping development and administration costs low. If you
need more control over statistics creation and update to get good query plans
and manage the overhead of gathering statistics, you can use manual statistics
creation and update capabilities.
An important
feature for high-throughput online transaction processing application
environments is the ability to asynchronously
update statistics automatically. This can improve predictability of query
response time in such environments.
SQL Server
2008 statistics features allow you to:
·
Implicitly
create and update statistics with the default sampling rate (in the SELECT,
INSERT, DELETE, UPDATE and MERGE commands, use of a column in a query condition
such as a WHERE or JOIN clause causes statistics to be created or updated if
necessary when automatic statistics creation and update is enabled).
·
Manually
create and update statistics with any desired sampling rate, and drop
statistics (CREATE STATISTICS, UPDATE STATISTICS, DROP STATISTICS, CREATE
INDEX, DROP INDEX).
·
Create
and maintain filtered statistics, which are defined over a subset of the
rows in the table by using a WHERE clause.
·
Automatically
create filtered statistics whenever a filtered index is created.
·
Manually
create statistics in bulk for all columns of all tables in a database (sp_createstats).
·
Manually
update all existing statistics in the database (sp_updatestats).
·
List
statistics objects that exist for a table or database (sp_helpstats, catalog views sys.stats, sys.stats_columns).
·
Display
descriptive information about statisticsobjects (DBCC SHOW_STATISTICS).
·
Enable
and disable automatic creation and update of statistics database-wide or
for a specific table or statistics object (ALTER DATABASE options
AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS, sp_autostats,
and the NORECOMPUTE option on CREATE STATISTICS and UPDATE STATISTICS).
·
Enable
and disable asynchronous automatic update of statistics (ALTER DATABASE
option AUTO_UPDATE_STATISTICS_ASYNC).
In addition,
SQL Server Management Studio allows you to graphically browse and control
statistics objects within its Object Explorer view. Statistics are listed in
Object Explorer in a folder underneath each table object.
What's New for Statistics in
SQL Server 2008?
SQL Server
2008 provides the option to create filtered
statistics, which are restricted to a subset of the rows in the table
defined by a predicate, also called a filter. Filtered statistics are either
created explicitly, or more commonly, through creating a filtered index.
- Automatic
creation: As mentioned, filtered statistics are automatically created as a
by-product of filtered indexes. Unlike regular statistics, filtered statistics
are never automatically created by the query optimizer as a result of query and
update statements.
- Automatic
update: To determine when statistics need to be automatically updated, the
query optimizer uses column modification counters. For filtered statistics, the
counter is multiplied by the selectivity of the filter, which ranges from 0 to
1, to compensate for the fact that the column counter tracks changes to all
rows in the table, not the changes to the rows in the filtered statistics
object.
- Sampling:
To build the statistics, SQL Server 2008 reads every nth data page, according
to the sampling rate, and then it applies the filter, if any.
- If
you don’t specify sampling rate, the default behavior is to sample based on the
number of rows in the table and, for filtered statistics, based on the
selectivity of the filter. The more selective the filter, the more rows need to
be sampled to gather enough data to build the histogram.
- Cardinality
estimate: As with regular statistics, the optimizer attempts to pick the most
relevant statistics when performing cardinality estimate. There is an
additional check that the optimizer must make for filtered statistics: whether
the statistics predicate contains the query predicate (that is, whether the
query predicate must be true for a row if the statistics predicate is true for
that row). Out of the possible filtered statistics, the query optimizer favors
those that are as close to the query predicate as possible.
Other Important Enhancements
in Statistics
The previous
version, SQL Server 2005, introduced a number of important statistics features
that allow the query optimizer to choose better query plans for a broader range
of queries, or otherwise improve the management of statistics:
·
Large
object support: Columns of type ntext, text, and image, as well as the new
types nvarchar(max), varchar(max), and varbinary(max) can be specified as statistics columns.
·
Improved
statistics loading framework: The optimizer internally improves the loading
of statistics compared with SQL Server 2000. It now loads all and only the
statistics it needs, improving optimization result quality and performance.
·
Minimum
sample size: A minimum of 8 megabytes (MB) of data, or the size of the
table if smaller, are now sampled during statistics gathering.
·
Increased
limit on number of statistics: The number of column statistics objects
allowed per table has been increased to 2,000. An additional 249 index
statistics may also be present, for a maximum of 2,249 statistics per table.
·
Clearer
and more consistent display of histograms: DBCC SHOW_STATISTICS is improved
because histograms are always scaled before they are stored in the catalogs.
·
Enhanced
DBCC SHOW_STATISTICS output: DBCC SHOW_STATISTICS now displays the name of
the statistics object being displayed, to avoid ambiguity.
·
Single
rowset output for DBCC SHOW_STATISTICS: DBCC SHOW_STATISTICS supports
options to output the header, density vector, and histogram individually as
single rowsets. This allows easier programming when DBCC SHOW_STATISTICS output
is automatically processed.
·
Statistics
on internal tables: Statistics are fully supported on the tables listed in sys.internal_tables, including XML and full-text indexes, Service
Broker queues, and query notification tables.
·
Statistics
on up to 32 columns: The limit on the number of columns in a statistics
object has been increased to 32 from 16.
·
Statistics
on partitioned tables: Statistics are fully supported on partitioned
tables, which were introduced in SQL Server 2005. Histograms are
maintained on a per-table basis (not per-partition).
·
Parallel
statistics gathering for fullscan: For statistics gathered with fullscan, creation of a single
statistics object can be done in parallel for both nonpartitioned and
partitioned tables.
·
Improved
recompiles and statistics creation in case of missing statistics: In the case
where auto create of statistics failed, on a subsequent execution of a plan
generated with missing statistics, auto create is performed and the plan is
recompiled; the missing statistics condition does not persist. For more
information, see the Batch
Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005
white paper by Marathe.
·
Improved
recompilation logic and statistics update for empty tables: Changing from 0
to > 0 rows in a table results in query recompilation and update of
statistics. For more information, see the Batch
Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005
white paper by Marathe.
·
Inferred
date correlation constraints: By enabling the DATE_CORRELATION_OPTIMIZATION
database setting, you can cause SQL Server to maintain information about the
correlation between datetime fields
across a pair of tables linked by a foreign key. This information is used to
allow implied predicates to be determined for some queries. The information is
not used directly for selectivity estimation or costing by the optimizer, so it
is not statistics in the strictest sense, but it is closely related to
statistics because it is auxiliary information used to help obtain a better
query plan.
·
sp_updatestats:
In SQL Server 2005 and SQL Server 2008, sp_updatestats
updates only those statistics that require updating based on the rowmodctr information in the sys.sysindexes
compatibility view, thus eliminating unnecessary updates of unchanged items.
For databases running under compatibility level 90 or higher, sp_updatestats preserves
the automatic UPDATE STATISTICS setting for any particular index or statistics.
There are
also some other minor changes in behavior for statistics. In particular, the statblob column in sys.sysindexes is now always given as NULL. The statblob itself is kept in an
internal-only catalog table.
Definitions
We define the
following terms related to SQL Server 2008 statistics:
·
statblob:
Statistics binary large object. This object is stored in the system catalogs in
an internal catalog view, sys.sysobjvalues.
·
String
summary: A form of statistics that summarizes the frequency distribution of
substrings in a string column. This is used to help estimate selectivity of
LIKE predicates. It is stored in the statblob
for a string column.
·
sysindexes:
The sys.sysindexes catalog view, which contains
information about tables and indexes.
·
Predicate:
A condition that evaluates to true or false. Predicates appear in a WHERE or
JOIN clause in a database query.
·
Selectivity:
The fraction of rows from the input set of the predicate that satisfy the
predicate. More sophisticated selectivity measures are also used to estimate
the number of rows produced by joins, DISTINCT, and other operators. For
example, SQL Server 2008 estimates the selectivity of the predicate "Sales.SalesOrderHeader.OrderID = 43659"
in the AdventureWorks
database as 1/31465 = 0.00003178.
·
Cardinality
estimate: An estimate of the size of a result set. For example, if a table T has 100,000 rows and a query contains
a selection predicate of the form T.a=10, and a histogram shows that the
selectivity of T.a=10 is 10%, then the cardinality estimate for the fraction of
rows of T that must be considered by the query is 10% * 100,000 = 10,000.
·
LOB:
Large object (value of type text, ntext,
image, varchar(max),nvarchar(max),
varbinary(max)).
·
Filter:
a condition that is evaluated to determine whether a row must be part of the
filtered statistics. The predicate appears in the WHERE clause of the CREATE
STATISTICS or CREATE INDEX statements (in the case when statistics are
automatically created as a side effect of creating an index).
Statistics Collected by SQL
Server 2008
SQL Server
2008 maintains the following information at the table level. These are not part
of a statistics object, but SQL Server 2008 uses them in some cases during
query cost estimation:
·
Number of rows in the table or index (rows column in sys.sysindexes)
·
Number of pages occupied by the table or index (dpages column in sys.sysindexes)
SQL Server
2008 collects the following statistics about table columns and stores them in a
statistics object (statblob):
·
The time the statistics were collected.
·
The number of rows used to produce the histogram
and density information (described later in this section). (The row heading is
“Rows sampled.”)
·
The average key length.
·
A single-column histogram, including the number
of steps.
·
A string summary, if the column contains
character data. DBCC SHOW_STATISTICS output contains a column String Index, which has the value YES
if a statistics object contains a string summary.
·
The estimated number of rows matching the filter
(for filtered statistics); or all rows in the table (for regular statistics).
(The row heading is “Rows.”)
·
All rows in the table. (The row heading is
“Unfiltered rows.”)
A histogram
is a set of up to 200 values of a given column. All or a sample of the values in
a given column are sorted; the ordered sequence is divided into up to 199
intervals so that the most statistically significant information is captured.
In general, these intervals are of nonequal size. The following values, or
information sufficient to derive them, are stored with each step of the
histogram.
Histograms in
SQL Server 2008 are only built for a single column—the first column in the set
of key columns of the statistics object.
SQL Server
2008 builds the histogram from the sorted set of column values in three steps:
1. Histogram initialization: In the first
step, a sequence of values starting at the beginning of the sorted set is
processed, and up to 200 values of RANGE_HI_KEY, EQ_ROWS, RANGE_ROWS, and
DISTINCT_RANGE_ROWS are collected (RANGE_ROWS and DISTINCT_RANGE_ROWS are
always zero during this step). The first step ends either when all input has
been exhausted, or when 200 values have been found.
2. Scan with bucket merge: Each additional
value from the leading column of the statistics key is processed in the second
step, in sorted order; for each successive value, either the value is added to
the last range, or a new range at the end is created (this is possible because
the input values are sorted). If a new range is created, one pair of existing,
neighboring ranges is collapsed into a single range. This pair of ranges is
selected in order to minimize information loss. The number of histogram steps
after the ranges are collapsed stays at 200 throughout this step. This method
is based on a variation of the maxdiff
histogram.
3. Histogram consolidation: In the third
step, more ranges can be collapsed if a significant amount of information is
not lost. Therefore, even if the column has more than 200 unique values, the
histogram might have less than 200 steps.
If the
histogram has been built using a sample, the values of RANGE_ROWS, EQ_ROWS,
DISTINCT_RANGE_ROWS and AVG_RANGE_ROWS are estimated and therefore they do not
need to be whole integers.
Density is
information about the number of duplicates in a given column or combination of
columns, and it is calculated as 1/(number
of distinct values). If a column is used in an equality predicate, the
number of qualifying rows is estimated by using the density derived from the
histogram. Histograms are also used to estimate the selectivity of nonequality
selection predicates, joins, and other operators.
In addition
to the timestamp showing the time the statistics were gathered, the number of
rows in the table, the number of rows estimated to match the filter, the number
of rows sampled to produce the histogram, the Density information and average
key length, and the histogram itself, single-column
statistics information includes an All density value for each set of columns forming a prefix of the statistics
column set. This is shown in the second rowset output by DBCC
SHOW_STATISTICS. All density is an
estimate of 1/(number of distinct values in the prefix column set). The next
section gives an example of this.
Note: The Density value included in the
first row returned by dbcc
show_statistics is the density of all values sampled other than the
RANGE_HI_KEY values. The RANGE_HI_KEY values are typically the more frequent
values in the distribution. Hence, the displayed Density value gives
potentially useful information about the density of nonfrequent values.
The multicolumnstatistics for one set of columns consist of one histogram for the first column in the statistics definition,
one density value for the first column, and an All Density value for each prefix combination of columns (including
the first column alone). Each set of multicolumn statistics (a histogram and
two or more density values) is stored in one statblob together with the timestamp of the last statistics update,
the number of rows in the sample used to produce the statistical information,
the number of steps in the histogram, and the average length of the key. A
string summary is included for only the first column, only if it contains
character data. Multicolumn densities are not supported for filtered
statistics.
Use sp_helpindex and sp_helpstats to display the list of all statistics available for a
given table. sp_helpindex lists all
indexes on the table, and sp_helpstats
lists all the statistics on the table. Each index also carries the statistical
information for its columns. The statistical information created by using the
CREATE STATISTICS command is equivalent to the statistics built by a CREATE
INDEX command on the same columns. The only difference is that the CREATE
STATISTICS command uses sampling by default while the CREATE INDEX command
gathers the statistics with fullscan, because it has to process all rows for
the index anyway.
>>Creating and
Displaying Statistics: Example
The
following example illustrates how statistics are created both automatically and
manually, and shows how to list and display information about statistics.
Results are given for some but not all commands; the output produced by SQL
Server 2008 is shown when this is useful to illustrate its behavior. You can
run this example yourself to see the complete set of output.
USE tempdb
GO
-- Clean up objects from any previous runs.
IF object_id(N'Person.Contact','U') IS NOT NULL
DROP TABLE Person.Contact
GO
IF EXISTS (SELECT * FROM sys.schemas WHERE name = N'Person')
DROP SCHEMA Person
GO
-- Create a sample schema and table.
CREATE SCHEMA Person
GO
CREATE TABLE Person.Contact(
FirstName nvarchar(60),
LastName nvarchar(60),
Phone nvarchar(15),
Title nvarchar(15)
)
GO
-- Populate the table with a few rows.
INSERT INTO Person.Contact
VALUES(N'John',N'Smith',N'425-555-1234',N'Mr')
INSERT INTO Person.Contact
VALUES(N'Erik',N'Andersen',N'425-555-1111',N'Mr')
INSERT INTO Person.Contact
VALUES(N'Erik',N'Andersen',N'425-555-3333',N'Mr')
INSERT INTO Person.Contact
VALUES(N'Jeff',N'Williams',N'425-555-0000',N'Dr')
INSERT INTO Person.Contact
VALUES(N'Larry',N'Zhang',N'425-555-2222',N'Mr')
GO
-- Show that there are no statistics yet on the Person.Contact table.
sp_helpstats N'Person.Contact', 'ALL'
GO
-- Implicitly create statistics on LastName.
SELECT * FROM Person.Contact WHERE LastName = N'Andersen'
GO
-- Show that statistics were automatically created on LastName.
sp_helpstats N'Person.Contact', 'ALL'
GO
Results:
-- Create an index, which also creates statistics.
CREATE NONCLUSTERED INDEX Phone on Person.Contact(Phone)
GO
-- Show that creating the index created an associated statistics object.
sp_helpstats N'Person.Contact', 'ALL'
GO
Results:
-- Create a multicolumn statistics object on first and last name.
CREATE STATISTICS FirstLast ON Person.Contact(FirstName,LastName)
GO
-- Show that there are now three statistics objects on the table.
sp_helpstats N'Person.Contact', 'ALL'
GO
Results:
-- Display the statistics for LastName.
DBCC SHOW_STATISTICS (N'Person.Contact', LastName)
GO
Results:
Column set prefixes and associated
densities and lengths:
Histogram steps:
-- If you take the name of the statistics object displayed by
-- the command above and subsitute it in as the second argument of
-- DBCC SHOW_STATISTICS, you can form a command like the following one
--(the exact name of the automatically created statistics object
-- will typically be different for you).
DBCC SHOW_STATISTICS (N'Person.Contact', _WA_Sys_00000002_2D7CBDC4)
-- Executing the above command illustrates that you can show statistics by
-- column name or statistics object name.
GO
-- The following displays multicolumn statistics. Notice the two
-- different density groups for the second rowset in the output.
DBCC SHOW_STATISTICS (N'Person.Contact', FirstLast)
Results (second rowset only):
Column set prefixes and associated
densities and lengths:
To
see a fully populated histogram for a larger table, run the following commands:
USE AdventureWorks
-- Clean up objects from previous runs.
IF EXISTS (SELECT * FROM sys.stats
WHERE object_id = object_id('Sales.SalesOrderHeader')
AND name = 'TotalDue')
DROP STATISTICS Sales.SalesOrderHeader.TotalDue
GO
CREATE STATISTICS TotalDue ON Sales.SalesOrderHeader(TotalDue)
GO
DBCC SHOW_STATISTICS(N'Sales.SalesOrderHeader', TotalDue)
GO
To
try out filtered statistics on the AccountNumber
column for all purchase orders that are not NULL, run:
-- Clean up objects from previous runs.
IF EXISTS (SELECT * FROM sys.stats
WHERE object_id = object_id('Sales.SalesOrderHeader')
AND name = 'ActNumberPONotNull')
DROP STATISTICS Sales.SalesOrderHeader.ActNumberPONotNull
GO
CREATE STATISTICS ActNumberPONotNull ON Sales.SalesOrderHeader (AccountNumber) WHERE PurchaseOrderNumber IS NOT NULL
GO
DBCC SHOW_STATISTICS(N'Sales.SalesOrderHeader', ActNumberPONotNull)
GO
Creating Statistics with SQL Server
2008
You
can create statistics in SQL Server 2008 in a number of different ways, as
described below:
·
The optimizer automatically creates
single-column statistics as needed as a side effect of optimizing SELECT,
INSERT, UPDATE, DELETE, and MERGE statements if AUTO_CREATE_STATISTICS is
enabled, which is the default setting.
Note:
The optimizer only creates nonfiltered statistics in these cases.
·
There are two basic statements in SQL
Server 2008 that explicitly generate the statistical information described
above: CREATE INDEX generates the declared index in the first place, and it
also creates one set of statistics for the column combinations constituting the
index keys (but not other included columns). CREATE STATISTICS only generates
the statistics for a given column or combination of columns.
Note:
If the CREATE INDEX defines a predicate, the corresponding statistics are
created with the same predicate.
·
In addition, there are several other ways to
create statistics or indexes. Ultimately, though, each issues one of the above
two commands. Use
sp_createstatsto create
statistics for all eligible columns (all except XML columns) for all user
tables in the current database. A new statistics object will not be created for
columns that already have a statistics object.
·
Use
dbcc dbreindexto
rebuild one or more indexes for a table in the specified
database.
·
In SQL Server Management Studio, expand the
folder under a Table object, right click the Statistics folder, and choose New
Statistics.
·
Use the Database Engine Tuning Advisor to create
indexes.
Here
is an example of a CREATE STATISTICS command on the AdventureWorks.Person.Contact table.
CREATE STATISTICS FirstLast2 ON Person.Contact(FirstName,LastName)
WITH SAMPLE 50 PERCENT
Usually,
statistics with default sampling are sufficient to allow good query plans to be
produced. However, there may be cases when statistics with larger sample sizes
can benefit query optimization, such as when the values in the given column
sample are not random. Nonrandom samples can occur if the data is sorted or
clustered. Sorting or clustering can be caused by the creation of indexes, or
by loading data into a heap structure that is already sorted or clustered. The
most commonly used larger sample size is fullscan,
because it gives the most accurate statistics. The price for using statistics
with larger sample sizes is the time required to create the statistics.
The
above command creates a single two-column statistics object. In this case,
SAMPLE 50 PERCENT is ignored and a full scan is performed because the table is
too small. Sampling is used primarily to avoid excessive scans of data and
affects only tables and indexes with 1,024 or more pages (8 MB).
In
SQL Server 2008, statistics are created for all indices at the index creation
time. SQL Server creates single-column statistics automatically when it
compiles queries. These statistics are created for columns where the optimizer
would otherwise have to estimate the approximate density or distribution. There
are the following exceptions to this rule. Statistics might not be created
automatically if (1) the database is read-only, (2) there are too many
outstanding compilations in progress, and (3) the column's data type is not
supported for auto stats.
The
automatic statistics creation function can be disabled at the database level by
executing the following command.
ALTER DATABASE dbname SET AUTO_CREATE_STATISTICS OFF
Similarly,
automatic statistics creation can be enabled at the database level by executing
the following command.
ALTER DATABASE dbname SET AUTO_CREATE_STATISTICS ON
It's
recommended that you leave this setting ON. Only disable it if you need to work
around known performance issues, such as the need to specify a sampling rate other
than the default for some tables.
By
default, statistics are created by sampling
the data set when the CREATE STATISTICS command is executed or when the
statistics are automatically created. CREATE INDEX scans the whole data set
anyway; therefore, the index statistics are initially created without sampling
(the equivalent of fullscan). The
CREATE STATISTICS command allows you to set the sample size in the WITH clause
either by specifying fullscan or the
percentage of data or number of rows to scan. The latter is interpreted as an
approximation. It is also possible to inherit the previous sample size when you
specify WITH RESAMPLE on the UPDATE STATISTICS command. This is particularly
useful when there are indexes on some columns or column sets (originally
created with fullscan statistics)
and only statistics on some other columns or column sets (originally created
with SAMPLE statistics). Using the resample
option on UPDATE STATISTICS will then maintain the fullscan statistics for the indexes and sample statistics for the
rest of the columns.
For
small tables, a minimum of 8 MB of data is sampled. If a table starts small,
and you sample with the default sampling rate, and thereafter use the resample option when updating the
statistics, you will get the equivalent of fullscan
even as the table grows beyond 8 MB. Avoid the use of resample if you want the default sampling rate as table size
varies.
The
resample sampling rate is computed as a function of the number of rows sampled
and the total number of rows in the table during the previous statistics
calculation. Because the actual sampling rate may vary due to the random nature
of sampling, the resample rate is only an approximation of the prior sampling
rate for non-fullscan samples. For consistently repeatable sampling, explicitly
specify the same sampling rate you used before with UPDATE STATISTICS instead
of using resample.
The
dbcc show_statistics command
displays the sample size under the Rows Sampled heading. Statistics created
automatically, or updated automatically (as described in the next section) are
always generated using default sampling. The default sampling rate is a
slow-growing function of the table size, which allows statistics to be gathered
relatively quickly even for very large tables.
When
statistics are created and updated, the optimizer must choose an access path to
gather the statistics. An access path may include a heap, clustered index, or
nonclustered index. For sampled statistics, the optimizer tries to avoid access
paths that are physically sorted on the leading statistics key column. This
helps provide a more random sample, which leads to more accurate statistics.
For those access paths that are not sorted on the statistics key (if such an
access path exists), the lowest cost access path is chosen. This is the
narrowest index or the heap. For fullscan
statistics, the sort order of the access paths is not significant for
statistical accuracy, so the lowest cost access path is chosen.
SQL
Server Profiler can monitor automatic statistics creation. The Auto Stats event
is in the group of Performance trace events. Select the IntegerData, Success,
and ObjectID columns for the Auto
Stats event when you define the trace. After the Auto Stats event is captured,
the Integer Data column contains the
number of statistics created or updated for a given table, the Object ID is the
ID of the table and the TextData
column (included in the trace definition by default) contains names of the
columns for which statistics creation or update was performed, together with
either an Updated: or Created: prefix. The Success column contains an indication
of the success or failure of the Auto Stats operation. Specifically, Success has three possible values.
In
some cases, you might also observe an Auto Stats event in which no statistics
are created or updated. Such an event is generated when auto update statistics is turned off, or when there have been a
substantial number of changes to a table referenced in a query, and the
optimizer has been able to remove all references to that table from the query
plan because of the structure of the query and the existence of a foreign key
constraint.
The
DROP STATISTICS command is used to drop statistics, but it is not possible to
drop statistics that are a byproduct of an index. Such statistics are removed
only when the index is dropped.
Maintaining Statistics in SQL Server
2008
After
a series of INSERTs, DELETEs, and/or UPDATEs are performed on a table the
statistics may not reflect the true data distribution in a given column or
index. If the SQL Server query optimizer requires statistics for a particular
column in a table that has undergone substantial update activity since the last
time the statistics were created or updated, SQL Server automatically updates
the statistics by sampling the column values (by using auto update statistics). The statistics auto update is triggered by
query optimization or by execution of a compiled plan, and it involves only a
subset of the columns referred to in the query. Statistics are updated before
query compilation if AUTO_UPDATE_STATISTCS_ASYNC is OFF, and asynchronously if
it is ON. When statistics are updated asynchronously, the query that triggered
the update proceeds using the old statistics. This provides more predictable
query response time for some work loads, particularly those with short running
queries and very large tables.
When
a query is first compiled, if the optimizer needs a particular statistics
object, and that statistics object exists, the statistics object is updated if
it is out of date. When a query is executed and its plan is in the cache, the
statistics the plan depends on are checked to see if they are out of date. If
so, the plan is removed from the cache, and during recompilation of the query,
the statistics are updated. The plan also is removed from the cache if any of
the statistics it depends on have changed.
SQL
Server 2008 determines whether to update statistics based on changes to column modification counters (colmodctrs).
A
statistics object is considered out of date in the following cases:
·
If the statistics is defined on a regular table,
it is out of date if:
o The
table size has gone from 0 to >0 rows (test 1).
o The
number of rows in the table when the statistics were gathered was 500 or less,
and the colmodctr of the leading column of the statistics object has changed by
more than 500 since then (test 2).
o The
table had more than 500 rows when the statistics were gathered, and the
colmodctr of the leading column of the statistics object has changed by more
than 500 + 20% of the number of rows in the table when the statistics were
gathered (test 3).
·
For filtered statistics, the colmodctr is first
adjusted by the selectivity of the filter before these conditions are tested.
For example, for filtered statistics with predicate selecting 50% of the rows,
the colmodctr is multiplied by 0.5.
·
One limitation of the automatic update logic is
that it tracks changes to columns in the statistics, but not changes to columns
in the predicate. If there are many changes to the columns used in predicates
of filtered statistics, consider using manual updates to keep up with the
changes.
·
If the statistics object is defined on a
temporary table, it is out of date as discussed above, except that there is an
additional threshold for recomputation at 6 rows, with a test otherwise
identical to test 2 in the previous list.
Table
variables do not have statistics at all.
The
auto update statistics feature
described above may be turned off at different levels.
·
On the database level, disable auto update statistics by using
the following command.
ALTER DATABASE dbname SET AUTO_UPDATE_STATISTICS OFF
·
At the table level, disable auto update
statistics using the NORECOMPUTE option of the UPDATE STATISTICS command or
CREATE STATISTICS command.
·
Use
sp_autostats to display and change the auto update statistics
setting for a table, index, or statistics object.
Re-enabling
the automatic updating of statistics can be done similarly using ALTER
DATABASE, UPDATE STATISTICS, or sp_autostats.
SQL
Server 2008 maintains the automatic statistics update setting on per-database,
per-table, and per-index or statistics object levels. Although you can change
this setting for all statistics on one table by using a single sp_autostats command, this is
implemented by changing the setting individually for each statistics object and
index on a given table. No metadata explicitly records that auto update
statistics is ON or OFF for the whole table.
The
following table shows the combined effect of different database, table, and
index settings.
It
is not possible to override the database setting of OFF for auto update
statistics by setting it ON at the statistics object level.
Auto
statistics update is always performed by sampling the index or table using the
default sampling rate. To set the sampling rate explicitly, run CREATE
STATISTICS or UPDATE STATISTICS.
Statistics
update is covered by the same SQL Server Profiler event as statistics creation.
String Summary Statistics
SQL
Server 2008 includes patented technology for estimating the selectivity of LIKE
conditions. It builds a statistical summary of substring frequency distribution
for character columns (a string summary). This includes columns of type text, ntext, char, varchar, and nvarchar. Using the string summary,
SQL Server can accurately estimate the selectivity of LIKE conditions where the
pattern may have any number of wildcards in any combination. For example, SQL
Server can estimate the selectivity of predicates of the following form.
Column LIKE 'string%'
Column LIKE '%string'
Column LIKE '%string%'
Column LIKE 'string'
Column LIKE 'str_ing'
Column LIKE 'str[abc]ing'
Column LIKE '%abc%xy'
If
there is a user-specified escape character in a LIKE pattern (that is, the
pattern is of the form LIKE pattern
ESCAPE escape_character), SQL Server
2008 guesses selectivity.
This
is an improvement over SQL Server 2000, which uses a guess for selectivity when
any wildcard other than a trailing wildcard % is used in the LIKE pattern, and
it has limited accuracy in its estimates in that case.
The
String Index field in the first row set returned by DBCC SHOW_STATISTICS
includes the value YES if the statistics object also includes a string summary.
The contents of the string summary are not shown. The string summary includes
additional information beyond what is shown in the histogram.
For
strings longer than 80 characters, the first and last 40 characters are
extracted from the string and concatenated prior to considering the string in
the creation of the string summary. Hence, accurate frequency estimates for
substrings that appear only in the ignored portion of a string are not
available.
Statistics on Computed Columns
SQL
Server 2008 and SQL Server 2005 support the creation, update, and use of
statistics on computed columns even when a query doesn't contain the computed
column by name, but rather contains the computed column expression. Earlier
versions can only automatically create, update, and use statistics on computed
columns if the column is named explicitly in the query.
You
can observe auto creation of computed column statistics for the AdventureWorks database table column Sales.SalesOrderHeader.TotalDue if you
execute the following Transact-SQL script in SQL Server 2008.
USE AdventureWorks
GO
-- Remove all statistics for Sales.SalesOrderHeader
DECLARE c CURSOR FOR
SELECT name FROM sys.stats
WHERE object_id = object_id('Sales.SalesOrderHeader')
AND auto_created <> 0 AND user_created <> 0
DECLARE @name NVARCHAR(255)
OPEN c
FETCH next FROM c INTO @name
WHILE @@FETCH_STATUS = 0
BEGIN
EXEC ('drop statistics Sales.SalesOrderHeader.' + @name)
FETCH NEXT FROM c INTO @name
END
CLOSE c
DEALLOCATE c
-- Query Sales.SalesOrderHeader with an expression equivalent
-- to the TotalDue computed column,
-- ((isnull(([SubTotal]+[TaxAmt])+[Freight],(0))).
SELECT *
FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderID
AND (isnull(([SubTotal]+[TaxAmt])+[Freight],(0))) > 200000.00
ORDER BY TotalDue DESC
-- List the created statistics. Observe that statistics
-- are created for TotalDue even though it is not explicitly
-- referenced in the query.
sp_helpstats 'Sales.SalesOrderHeader'
SQL
Server 2008 does not support statistics on non-persisted computed columns that
involve a common language runtime (CLR) expression, such as an invocation of a
CLR user-defined function. To create statistics on a CLR computed column, the
column must be marked PERSISTED.
Statistics on CLR User-Defined Type
Columns
SQL
Server 2008 supports the creation, update, and use of statistics on CLR
user-defined type columns if the type supports binary ordering. Statistics are
not supported on user-defined types that don't support binary ordering. A type
is binary ordered if it has the IsByteOrdered flag set to true in the
SqlUserDefinedType attribute that is specified as part of the type definition.
If a type supports binary ordering, standard binary sort ordering is a
semantically correct order for the type.
Statistics and Indexed Views
Normally,
statistics are not required on indexed views. This is because the substitution
of the indexed views into the query plan is considered only after all the
statistics for the underlying tables and indexes are attached to the query
plan. There is one exception: Statistics are used if the view is directly
referenced by the NOEXPAND hint in a FROM clause. Note that an error is
generated and the plan is not created if the NOEXPAND hint is used on a view
that does not also contain an index.
Because
of their limited use, the statistics on indexed views are not created by using sp_createstats or updated by using sp_updatestats. The auto update and
auto create statistics features work for indexed views. But as noted before,
such statistics would be required by the optimizer, and they are subsequently
created only if the indexed view is used with the NOEXPAND hint in the query
and if the AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS options are set to
ON. You can also manually perform CREATE STATISTICS on the indexed view columns
or use UPDATE STATISTICS to update column or index statistics on indexed views.
Best Practices for Managing Statistics
The
goal for using statistics in SQL Server is to allow the optimizer to get good
cardinality estimates so it can find good query plans, while at the same time
keeping any overhead or delay associated with gathering statistics reasonable.
Below we list best practices for managing statistics in SQL Server, with the
most important ones first.
Use
auto create and auto update statistics
For a large majority of SQL Server
installations, the most important best practice is to use auto create and auto
update statistics database-wide. Auto create and auto update statistics are on
by default. If you observe bad plans and suspect that missing or out of date
statistics are at fault, verify that auto create and auto update statistics are
on.
When statistics are created or
maintained automatically, SQL Server samples the rows by default, instead of
performing a full scan. (An exception is when statistics are created as a
by-product of index creation, when SQL Server uses full scan.) Update of
statistics with a different sampling rate (from full scan to sampled, or vice
versa), can cause query plan changes. If query plan stability is a high
priority, drop and re-create with sampling any statistics that are the
by-product of index creation.
If
needed, selectively use FULLSCAN statistics
If you are using auto create and auto
update statistics and you are getting a bad query plan because the statistics
are not accurate or current, do the following:
1.
Leave auto create and auto update statistics on.
2.
Only for those statistics that are not accurate
or current enough, use
CREATE
STATISTICS … WITH FULLSCAN, NORECOMPUTE
and a batch job that does
UPDATE
STATISTICS … WITH FULLSCAN, NORECOMPUTE periodically.
The frequency with which you should
update statistics depends on your application, and it may require some
experimentation to determine. A good starting point for the frequency of fullscan update is your update rate. If
the table under consideration has a high update rate, run a fullscan statistics update nightly. If
the table has a low update rate, run a fullscan
statistics update weekly.
>Avoid
use of local variables in queries
If you use a local variable in a query
predicate instead of a parameter or literal, the optimizer resorts to a
reduced-quality estimate, or a guess for selectivity of the predicate. Use
parameters or literals in the query instead of local variables, and the
optimizer typically can pick a better query plan. For example, consider this query,
which uses a local variable.
declare @StartOrderDate datetime
set @StartOrderDate = '20040731'
select * from Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderId
AND h.OrderDate >= @StartOrderDate
The number of rows from Sales.SalesOrderHeader that the
optimizer estimates will qualify versus the condition h.OrderDate >=
@StartOrderDate is 9439.5, which is exactly 30 percent of the size of the
table. You can use the graphical showplan for the query and right-click the
plan node for Sales.SalesOrderHeader
to display this cardinality estimate. In a prerelease version of SQL Server
2008 used while preparing this paper, the plan chosen uses a merge join (the
observations that follow are based on this same SQL Server 2008 version; your
results may differ depending on your SQL Server version, available memory, and
so on.). Now, consider this equivalent query, which doesn't use a local
variable:
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderId
AND h.OrderDate >= '20040731'
The cardinality of the result set for
the predicate "h.OrderDate >= '20040731'" is estimated as 40 in
the graphical showplan for the query (right-click the Filter operator), for a selectivity of 0.13%. The plan chosen for
this query uses a nested loop join instead of a merge join because of this
improved estimate.
Even when local variables are used in
a query, an estimate that is better than a guess is used in the case of
equality predicates. Selectivity for conditions of the form
"@local_variable = column_name" is estimated using the average value
frequency from the histogram for column_name. So, for example, if the column
column_name contains all unique values, a selectivity estimate of 1/(number of
unique values in column) is used, which is accurate.
To eliminate the use of local
variables, consider (1) rewriting the query to use literals instead of
variables, (2) using sp_executesql
with parameters that replace your use of local variables, or (3) using a stored
procedure with parameters that replace your use of local variables. Dynamic SQL
via EXEC may also be useful for eliminating local variables, but it typically
results in higher compilation overhead and more complex programming. A new
enhancement in SQL Server 2008 is that the OPTION(RECOMPILE) hint expands both
variables and parameters inline as constant literals before optimization. If
you can tolerate the extra compile time, this is an excellent way to work
around the problem of poor selectivity estimates associated with variables in
query predicates.
Consider
limiting use of multistatement table-valued user-defined functions and table
variables
Multistatement table-valued
user-defined functions (also known as TVFs) have no statistics. The optimizer
must guess the size of their results. Similarly, table variables don't have
statistics, and the optimizer must resort to guesses for their cardinality. If
a bad plan results because of these guesses, consider using a standard table or
temporary table as a temporary holding place for the results of the
table-valued user-defined function, or a replacement for the table variable.
This allows the optimizer to use better cardinality table-valued user-defined
function estimates.
Use
multicolumn statistics when queries have multicolumn conditions
If a query has a multicolumn
condition, consider using multicolumn statistics if you suspect that the
optimizer is not producing the best plan for the query. You get multicolumn
statistics as a by-product of creating a multicolumn index, so if there is
already a multicolumn index that supports the multicolumn condition, there is
no need to create statistics explicitly. Auto create statistics only creates
single-column statistics, never multicolumn statistics. So if you need
multicolumn statistics, create them manually, or create a multicolumn index.
If one of the columns has a low
distinct value count, you may achieve higher accuracy if you create multiple
filtered statistics, one per each distinct value. For more information, see the
following best practice, “Consider filtered statistics when querying correlated
columns.”
Consider a query that accesses the AdventureWorks.Person.Contact table,
and contains the following condition.
FirstName = 'Kim' AND LastName = 'Akers'
To make selectivity estimation more
accurate for this query, create the following statistics object:
CREATE STATISTICS LastFirst ON Person.Contact(LastName,FirstName)
This statistics object will be useful
for queries that contain predicates on LastName
and FirstName, as well as LastName alone. In general, the
selectivity of a predicate on any prefix of the set of columns in a multicolumn
statistics object can be estimated using that statistics object.
For a statistics object to fully
support a multi-column condition, a prefix of the columns in the multicolumn
statistics object must contain the columns in the condition. For example, a
multicolumn statistics object on columns (a,b,c) only partially supports the
condition a=1 AND c=1; the histogram is used to estimate the selectivity for
a=1, but the density information for c is not used, because b is missing from
the condition. Multicolumn statistics on (a,c) or (a,c,b) would support the
condition a=1 AND c=1, and density information could be used to improve
selectivity estimation.
Consider
filtered statistics when querying correlated columns
If your table contains columns
correlated to each other, the query optimizer will have difficulties predicting
the selectivity of any expressions involving two or more such columns. Think
about your favorite car make and model. Is there a statistical correlation
between the two? For example, if a table contains the columns CarMake and CarModel, a query with the predicate WHERE CarMake=constant and
CarModel=constant can result in underestimated expected rows in the query plan.
This is because the query optimizer does not know how closely the two
conditions are related (multicolumn densities can only be of limited use,
because they count the total distinct values across the table). Filtered
statistics can be a much better option in such cases. Define separate filtered
statistics on the CarModel column
for each CarMake.
CREATE STATISTICS model1 ON autos(CarModel) WHERE CarMake=constant1
CREATE STATISTICS model2 ON autos(CarModel) WHERE CarMake=constant2
...
At query compilation time, the
optimizer can make a much better estimate by picking the correct statistics
object, based on the value of CarMake,
and then looking up the number of models from the steps of the histogram.
Be
alert for any situation where SQL Server falls back to guessing selectivity
SQL Server guesses selectivity in a
number of situations. Often, the guesses are reasonable, the data sizes are
small enough that this is not a problem, or the guess doesn't lead to a bad
query plan. Nevertheless, when SQL Server must guess selectivity of a query
predicate, sometimes less-than-optimal query plans result. If a query is not
performing as well as you would like, and you suspect that a suboptimal query
plan is being selected, look at the query and its resulting plan for signs that
selectivity was guessed rather than estimated based on statistics. In many
cases, you can modify the query or a small part of your application to avoid
the guess. The following list gives some situations that can lead to guesses,
and possible workarounds:
·
Missing
statistics: Verify that auto create statistics is enabled, or manually
create statistics by using CREATE STATISTICS or sp_createstats. Check to see whether the database is read-only;
auto create statistics does not work on read-only databases.
·
Use of
local variables in the query condition (this was addressed previously).
·
Non-constant-foldable
expressions in the query condition (for example, T.x+1 = 0).
Rewrite the query to eliminate the expression, or evaluate the expression prior
to executing the query and pass the result into the query as a parameter (not a
local variable). In the case of T.x+1 = 0, rewrite the expression as
T.x = -1. This is an equivalent expression, and it allows an accurate
estimate instead of a guess.
·
Complex
expressions such as "Price + Tax > 100" or
"Price * (1+TaxRate) > 100". If you see
slower-than-desired query performance in such a case, consider creating a
computed column with the equivalent expression, and creating statistics or an
index on the computed column. Auto create statistics also creates statistics
for the computed column if it exists, so you need not create the computed
column statistics manually if auto create statistics is enabled.
Avoid
modification of parameters in a stored procedure prior to use in a query
For best query performance, in some
situations you'll need to avoid assigning a new value to a parameter of a
stored procedure within the procedure body, and then using the parameter value
in a query. The stored procedure and all queries in it are initially compiled
with the parameter value first passed in as a parameter to the query. This is
sometimes called parameter sniffing.
Consider the following stored procedure, which gets sales on or after a certain
date, or if NULL is passed as an argument, the last three months of sales.
CREATE PROCEDURE GetRecentSales (@date datetime) WITH RECOMPILE AS
BEGIN
IF @date IS NULL
SET @date = dateadd("mm",-3,(SELECT MAX(OrderDATE)
FROM Sales.SalesOrderHeader))
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderID
AND h.OrderDate > @date
END
This stored procedure, if called with
NULL, has the final SELECT statement optimized for @date = NULL. Because no
rows have NULL OrderDate, the cardinality estimate for the result of applying
this filter to SalesOrderHeader is
very low (1 row). However, at run time, the date is not NULL, but three months
before the latest OrderDate. The actual number of SalesOrderHeader rows that qualify is 5,736. The optimizer chooses
a nested loop join for the query when NULL is passed to GetRecentSales, whereas the optimal plan contains a merge join. You
can see the plan selected, and the expected and actual cardinalities, using
this script.
SET STATISTICS PROFILE ON
GO
EXEC GetRecentSales NULL
GO
SET STATISTICS PROFILE OFF
GO
Notice that the WITH RECOMPILE option
specified on the GetRecentSales
stored procedure above does not eliminate the cardinality estimation error. One
way to ensure that the queries in this example are optimized with appropriate
parameter values that allow good estimates to be obtained is to modify the
stored procedure as follows, breaking it down into parts.
CREATE PROCEDURE GetRecentSales (@date datetime) AS
BEGIN
IF @date IS NULL
SET @date = dateadd("mm",-3,(SELECT MAX(OrderDATE)
FROM Sales.SalesOrderHeader))
EXEC GetRecentSalesHelper @date
END
CREATE PROCEDURE GetRecentSalesHelper (@date datetime) WITH RECOMPILE AS
BEGIN
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderID
AND h.OrderDate > @date -- @date is unchanged from compile time,
-- so a good plan is obtained.
END
Consider
more frequent statistics gathering for ascending keys
Ascending key columns, such as
IDENTITY columns or datetime columns
representing real-world timestamps, can cause inaccurate statistics in tables
with frequent INSERTs because new values all lie outside the histogram.
Consider updating statistics on such columns frequently with a batch job if
your application seems to be getting inadequate query plans for queries that
have a condition on the ascending key column. How often to run the batch job
depends on your application. Consider daily or weekly intervals, or more often
if needed for your application.
Use
asynchronous statistics update if synchronous update causes undesired delay
If you have a large database and an
OLTP workload, and if you enable AUTO_UPDATE_STATISTICS, some transactions that
normally run in a fraction of a second may very infrequently take several
seconds or more because they cause statistics to be updated. If you want to
avoid the possibility of this noticeable delay, enable
AUTO_UPDATE_STATISTICS_ASYNC. For workloads with long-running queries, getting
the best plan is more important than an infrequent delay in compilation. In
such cases, use synchronous rather than asynchronous auto update statistics.
Consider
filtered statistics for heterogeneous data
Sometimes rows with different schema
are mapped to a single physical table, with multipurpose columns such as ntext1, ntext2, bigint1, bigint2
storing semantically unrelated data. Typically, there is also a special-purpose
rowtype column that defines what is
the semantic meaning of the data stored in each column. Such design is useful
for storing arbitrary user-defined lists without changing the underlying
database schema. As a result, the same column may end up storing telephone
numbers and city names, and a histogram on such column may not be very useful,
due to the limit of 200 steps. To avoid this, define separate statistics for
each rowtype in this table.
Consider
filtered statistics for partitioned tables
Statistics are defined at the table
level. Changes to partitions affect statistics only indirectly, through the
column modification counters. Switching in a partition is treated as an insert
of its rows into the table, triggering statistics update based on the 20% rule,
as outlined above. Filtered statistics, through their predicates, can target
only rows in certain partition or partitions. There is no requirement to align
to the boundaries of the partitions when defining the statistics.
Often, customers partition a table by
the Date column, keeping a partition
for every month, and updating only the last partition; older months receive the
majority of the complex, read-only queries. In this scenario, creating
separate, full-scan statistics on the read-only region of the table results in
more accurate cardinality estimates. In order to benefit from the separate
statistics object, queries must be contained within the read-only region.
Similarly, separate statistics objects can be created for different regions
based on the different access patterns.
Conclusion
SQL Server
2008 contains many enhancements to its statistics management capabilities. Most
importantly, you can rely on the automatic creation and update of statistics to
help ensure good query plans in most cases. When automatic statistics with the
default sampling rate are not sufficient, you can control statistics sampling
rates and creation and update time explicitly. If you are observing what
appears to be a suboptimal query plan related to statistics or cost estimation,
consider the best practices described in this paper.
For more information:
http://www.microsoft.com/sqlserver/: SQL Server Web site
http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter
http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter
Did this
paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to
5 (excellent), how would you rate this paper and why have you given it this
rating? For example:
- Are
you rating it high due to having good examples, excellent screen shots, clear
writing, or another reason?
- Are
you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback
will help us improve the quality of white papers we release.
Send feedback.
References
SQL Server
2008 contains many enhancements to its statistics management capabilities. Most
importantly, you can rely on the automatic creation and update of statistics to
help ensure good query plans in most cases. When automatic statistics with the
default sampling rate are not sufficient, you can control statistics sampling
rates and creation and update time explicitly. If you are observing what
appears to be a suboptimal query plan related to statistics or cost estimation,
consider the best practices described in this paper.
[Mar04] Arun Marathe, Batch Compilation, Recompilation, and Plan
Caching Issues in SQL Server 2005, http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx,
July 2004.