Modify Count Table Parameters

 

Updated: August 25, 2016

Modifies the parameters used to create features from counts

You can use the Modify Count Table Parameters module to change the way that features are generated from a count table.

In general, to create count-based features, you use Build Counting Transform to process a dataset and create a count table, and from that count table generate a new set of features. However, if you have already created the count table, you can use the Modify Count Table Parameters module to edit the definition of how the count data is processed, to create a new set of count-based statistics based on already processed data, without having to re-analyze the dataset.

  1. Locate the transformation you want to modify, in the Transforms group, and add it to your experiment.

    You should have previously run an experiment that created a count transformation.

    • To modify a saved transform

      Locate the transformation, in the Transforms group, and add it to your experiment.

    • To modify a count transformation created within the same experiment

      If the transformation has not been saved, but is available as an output in the current experiment (for example, check the output of the Build Counting Transform module), you can use it directly by connecting the modules.

  2. Add the Modify Count Table Parameters module and connect the transformation as an input.

  3. In the Properties pane of the Modify Count Table Parameters module, type a value to use as theGarbage bin threshold.

    This value specifies the minimum number of occurrences that must be found for each feature value, in order for counts to be used. If the frequency of the value is less than the garbage bin threshold, the value-label pair is not counted as a discrete item; instead, all items with counts lower than the threshold value are placed in a single "garbage bin".

    System_CAPS_tipTip

    If you are using a small dataset and you are counting and training on the same data, a good starting value is 1.

  4. for Additional prior pseudo examples, type a number that indicates the number of additional pseudo examples to include.

    You do not need to provide these examples; the pseudo examples are generated based on the prior distribution.

  5. For Laplacian noise scale, type a positive floating-point value that represents the scale used for introducing noise sampled from a Laplacian distribution. In other words, by setting a scale value, some acceptable level of noise is incorporated into the model, so the model is less likely to be affected by unseen values in data.

  6. In Output features include, choose the method to use when creating count-based features for inclusion in the transformation.

    • CountsOnly.   Create features using counts.

    • LogOddsOnly.   Create features using the log of the odds ratio.

    • BothCountsAndLogOdds.   Create features using both counts and log odds.

  7. Select the Ignore back off column option if you want to override the IsBackOff flag in the output when creating features

    When you select this option, count-based features will be created even if the column doesn’t have significant count values.

  8. Run the experiment. You can then save the output of Modify Count Table Parameters as a new transformation, if desired.

You can see examples of how this module is used by exploring these sample experiments in the Model Gallery:

It is statistically safe to count and train on the same data set if you set the Laplacian noise scale parameter.

Name

Type

Description

Counting transform

ITransform interface

The counting transform to apply.

  

Name

ToHide

Type

Range

Optional

Description

Default

Garbage bin threshold

garbageBinThreshold

Float

>=0.0f

Required

10.0f

The threshold under which a column value will be featurized against the garbage bin.

Additional prior pseudo examples

priorEx

Float

>=0.0f

Required

42.0f

The additional pseudo examples following prior distributions to be included.

Laplacian noise scale

noiseScale

Float

>=0.0f

Required

0.0f

The scale of the Laplacian distribution from which noise is sampled.

Output features include

outputFeatureInclude

OutputFeatureType

Required

BothCountsAndLogOdds

The features to output.

Ignore back off column

ignoreBackOff

Boolean

Required

false

Whether to ignore the IsBackOff column in the output.

Name

Type

Description

Modified transform

ITransform interface

The modified transform.

Exception

Description

Error 0003

Exception occurs if one or more of inputs are null or empty.

Error 0086

Exception occurs when a counting transform is invalid.

Show: