SQL Server 2005 Books Online 
Grouping Members (Discretization) 

 

The grouping of members of an attribute into buckets is called discretization. When end users browse a level of a hierarchy that is based on an attribute, they see the names and values of the buckets, instead of the members themselves. To end users, the buckets look like regular members.

Discretization significantly reduces the number of members that are displayed for an attribute, without changing the structure of the attribute. If there are a lot of members in a level, this savings can greatly facilitate browsing a hierarchy. For example, an end user can expand a member that contains 30,000 children without waiting for all 30,000 members to expand under the parent.

Note:
Discretization does not replace good data warehouse design. Many thousands of members under a single parent may indicate an inefficient data warehouse design.

 

 

Discretization of an attribute is determined by its DiscretizationMethod property setting. The following table describes the different settings for this property.

 

Discretization Method Description

None

No grouping of members occurs.

Automatic

The server automatically chooses the best of the following grouping methods, depending on the structure of the attribute: EqualAreas, Clusters, EqualRanges, or Thresholds.

EqualAreas

Examines the distribution of values across the population and creates bucket ranges so that the total population is distributed equally across the buckets. In other words, if the distribution of continuous values were plotted as a curve, the areas under the curve covered by each bucket range would be equal.

Clusters

Groups members by performing single-dimensional clustering on the input values by using the K-Means algorithm. It uses Gaussian distributions. This can only be used for numeric columns.

Thresholds

Breaks the input range of continuous values into buckets based on the inflection (turning) points in their distribution curve—these would be the points where the gradient changes direction. If the number of points is more than the requested number of buckets N, it sorts by height and selects the first N. This can only be used for numeric columns.

Note:
This feature will be removed in the next version of Microsoft SQL Server. Do not use this feature in new development work, and modify applications that currently use this feature as soon as possible. 

 

 

UserDefined

Applies only for dimensions in session cubes.

You can determine the number of buckets used to group members by specifying a value for the DiscretizationBucketCount property. The default setting for this property is 0. If the DiscretizationBucketCount property is set to 0 and the DiscretizationMethod property is set to something other than EqualAreas or EqualRanges, the server determines the sizes and values of the buckets by sampling the data. If the DiscretizationMethod property is set to EqualAreas or EqualRanges, the server reads all the members of an attribute (or data mining column) to calculate the size of the buckets. The default size is the square root of the count of the members. If you know how many buckets you want, or if the server creates more or fewer buckets than you want, you can set DiscretizationBucketCount manually.

Naming Buckets

By default, a bucket is labeled with the first and last member names in the bucket separated by a hyphen (-). To use a different naming scheme, you can specify a naming template. You specify the naming template for an attribute by using the Format field on the NameColumn property. If there are multiple translations, you can redefine the Format field for every translation.

The template string uses the following format:

        first bucket name 
        ;
        intervening buckets name 
        ;
        last bucket name
      

First bucket name is applied to the first bucket of members; intervening buckets name is applied to all intervening buckets of members; last bucket name is applied to the final bucket of members in the attribute. The bucket names are separated by the semicolon (;) character.

The bucket names can be composed of any character string, with a member indicated by one of the operators described in the following table.

 

Operator Description

%{First bucket member}

The first member in the current bucket

%{Last bucket member}

The last member in the current bucket

%{Previous bucket last member}

The last member of the previous bucket

%{Next bucket first member}

The first member of the next bucket

%{Bucket Min}

The smallest (minimum) member in the current bucket

%{Bucket Max}

The largest (maximum) member in the current bucket

%{Previous Bucket Min}

The smallest (minimum) member in the previous bucket

%{Previous Bucket Max}

The largest (maximum) member in the previous bucket

%{Next Bucket Min}

The smallest (minimum) member in the next bucket

%{Next Bucket Max}

The largest (maximum) member in the next bucket

The following format strings are supported:

         string1;string2;string3

where string1 applies to the first bucket, string2 applies to the intervening buckets, and string3 applies to the last bucket.

         string       

where string applies to all the buckets. To include a literal semicolon (;) in a string, prefix it with the percent sign (%) escape character.

If you do not specify a format string, the default format described earlier corresponds to a single string that formats all the buckets identically, as follows:

%{First Bucket Member} - %{Last Bucket Member}

To specify separate formats for the first bucket, the intervening buckets, and the last bucket, you can create a template string like that shown in the following example:

Salary less than %{Last bucket member}; Salary range from %{First bucket member} to %{Last bucket member}; Salary from %{First bucket member} and higher

For measures that range from less than 9,000 to greater than 100,000, this string displays the following result in the Browser tab of Dimension Designer:

Salary less than 9,999

Salary range from 10,000 to 19,999

Salary range from 20,000 to 29,999

Salary range from 30,000 to 39,999

Salary range from 40,000 to 49,999

Salary range from 50,000 to 59,999

Salary range from 60,000 to 69,999

Salary range from 70,000 to 79,999

Salary range from 80,000 to 89,999

Salary range from 90,000 to 99,999

Salary from 100,000 and higher

If you discretize an attribute, make sure that you order the members correctly. You can set the OrderBy property for the attribute to order by key or by name of a member property. The groups shown in this example are ordered by the key value of the attribute. If the OrderBy property is set to Name, the order is based on the entire template string instead of on the values of the members, and the three categories of buckets may not be in order.

Updating a Dimension with Discretized Attributes

When you process a dimension, a discretized attribute is rediscretized only with a full update (ProcessFull). To rediscretize an attribute, you must perform a full update of the dimension. If the dimension table of a discretized attribute is updated and you process the dimension with an incremental update (ProcessAdd), the discretized attribute is not rediscretized. The names and children of the new buckets remain the same. For more information about processing dimensions, see Processing OLAP Objects.

See Also

Other Resources

Configuring Attributes

Help and Information

Getting SQL Server 2005 Assistance
Page view tracker