SYSTEM specifies an ANSI SQL implementation-dependent sampling method. Specifying SYSTEM is optional, but this option is the only sampling method available in SQL Server and is applied by default.
TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table. Based on the random value for a page and the percentage specified in the query, a page is either included in the sample or excluded. Each page that is included returns all rows in the sample result set. For example, if you specify TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified data pages of the table. If the rows are evenly distributed on the pages of the table, and if there is a sufficient number of pages in the table, the number of rows returned should approximate the sample size that is requested. However, because the random value that is generated for each page is independent of the values that are generated for any other page, a larger, or smaller, percentage of pages than have been requested might be returned. The TOP(n) operator can be used to limit the number of rows to a specified maximum.
When a number of rows is specified, instead of a percentage based on the total number of rows in the table, that number is converted into a percentage of the rows and, therefore, pages that should be returned. The TABLESAMPLE operation is then performed with that computed percentage.
If the table is made up of a single page, either all rows on the page are returned or none of the rows are returned. In this case, TABLESAMPLE SYSTEM can only return 100 percent or 0 percent of the rows on a page, regardless of the number of rows on the page.
Using TABLESAMPLE SYSTEM for a specific table limits the execution plan to use a table scan (a scan of the heap, or of the clustered index if one exists) on that table. Although the plan shows a table scan is performed, only those pages that are included in the result set are actually required to be read from the data file.
Important: |
|---|
|
The TABLESAMPLE SYSTEM clause should be used with some caution, and with some understanding of some of the implications of using sampling. For example, a join of two tables is likely to return a match for each row in both tables; however, if TABLESAMPLE SYSTEM is specified for either of the two tables, some rows returned from the unsampled table are unlikely to have a matching row in the sampled table. This behavior might lead you to suspect that a data consistency problem exists in the underlying tables, when the data is actually valid. Similarly, if TABLESAMPLE SYSTEM is specified for both tables that are joined, the perceived problem could be even worse.
|