Viewing with Data Mining Model Browser
Data Mining Model Browser allows you to view data mining content from the vantage point of a single attribute and its relationships. It shows the mining model content for each node that is influenced by a single attribute, as well as histogram data for each node. It displays the data mining model nodes used in the mining model, including the relationships between the nodes and the rules or attributes assigned to them, as an interconnected network of boxes. Each box represents a node in a single decision tree or a single cluster.
The nodes are color-coded to represent the data density of an attribute applicable to a selected node in relation to the total number of cases processed by the selected node. The color coding and selected attribute can be changed through the use of the tree color drop-down list on the legend pane.
The nodes are represented in ranking order of attribute factors, from left to right, in the content detail pane. The further down the tree a split is represented, the less influence the factor that caused the carries in the data mining model. Additionally, the attributes pane allows sorting of attributes by number of cases or probability of occurrence in the selected node, allowing you to better understand the relevance of a given attribute to a node.
The benefit of data mining model content visualization with Data Mining Model Browser is the understanding of the patterns and rules that encompass a case set, and the ability to fine tune these patterns and rules to better fit training data. For example, you can use the visualization capabilities of Data Mining Model Browser to eliminate a common problem in data mining called overfitting. Overfitting occurs when the data mining model starts constructing rules that are specific to single cases; the model starts attaching importance to unimportant patterns. For example, assume that there is a customer case set for a department store data mining model, which includes the last name of the customer as an attribute field. The data mining model might create a rule where a customer named Smith is most likely to purchase tools because a single customer named Smith purchased tools. This rule is based on a random pattern, which has no meaningful content. This rule is an example of overfitting; the correlation between the last name of a customer and the type of products purchased is meaningless. Overfitting occurs most often when attributes are added to a data mining model that do not supply meaningful content. In such cases, the model attempts to construct rules where none should exist.
The information shown in Data Mining Model Browser represents the statistical model of trends learned by the data mining model through the review of training data. As such, you will find it useful to review the attributes and node paths that define the knowledge gained by training a data mining model to better understand the general patterns and rules represented by the training data.