# Determining the Entropy of the Values We Added

Visual Studio 6.0

This next procedure, `determineEntropy`, is the last one called from the `cmdAnalyze` button. What we want to do here is calculate something called (surprise!) entropy. The term comes from communication theory and is a precise measurement of information. Remember that we added the fields `CountryLanguage`, `CountryRegion` and `Country` to our `ID3` table? What we now want to do is determine the relative impact of each of these categories on total sales growth. Entropy is a measure of the uncertainty of the classification. The smaller the number, the less 'uncertainty' that the specific classification is important to the result, sales growth. As the number increases, the more uncertainty that this classification really does influence the sales growth.

We want to see how the three categories: `CountryLanguage`, `CountryRegion` and `Country` compare in importance to sales growth. We will do that by calculating the entropy factor for each. The lesser of the three numbers indicates the factor that is the most important in sales growth.

#### Try It Out - Determining which Factors Impact Sales the Most!

1.  Please add another sub to the `frmID3` form called `determineEntropy`. Add the following code to that procedure.

``````Private Sub determineEntropy()

Dim adoTemp As ADODB.Recordset
Dim sSql As String
Dim totalSamples As Integer
Dim entropyCountry As Single
Dim entropyCountryLanguage As Single
Dim entropyCountryRegion As Single
Dim Position(2, 1) As Variant 'holds the classifications

sbStatus.Panels.Item(1).Text = "Determining Entropy..."
sbStatus.Panels.Item(2).Text = "Working..."

adoConnection.Execute("DELETE * FROM ID3 WHERE Category = NULL")
'-----------------------------------------------------
'-- Determine how many records are in the ID3 Table --
'-----------------------------------------------------
Set adoTemp = New ADODB.Recordset
sSql = "SELECT count(*) as HowMany from ID3"
adoTemp.Open sSql, adoConnection
totalSamples = adoTemp!HowMany
adoTemp.Close

'----------------------------------------------------------
'-- Determine the relative Entropy on each of the fields --
'----------------------------------------------------------
entropycountryLanguage = getEntropy _
("CountryLanguage", totalSamples)
Position(0, 0) = entropycountryLanguage
Position(0, 1) = "CountryLanguage"
entropyCountryRegion = getEntropy("CountryRegion", totalSamples)
Position(1, 0) = entropyCountryRegion
Position(1, 1) = "CountryRegion"
entropyCountry = getEntropy("Country", totalSamples)
Position(2, 0) = entropyCountry
Position(2, 1) = "Country"

Call qsort(Position, LBound(Position), UBound(Position))

txtAnalysis = "ID3 Analysis of the Category " & lstCategory & _
" and the Product " & lstProduct & vbCrLf
txtAnalysis = txtAnalysis & "The lesser the entropy, the more "
txtAnalysis = txtAnalysis & "important is this Attribute "
txtAnalysis = txtAnalysis & "to overall Sales" & vbCrLf
txtAnalysis = txtAnalysis & Position(0, 1) & " Entropy: " & _
Position(0, 0) & vbCrLf
txtAnalysis = txtAnalysis & Position(1, 1) & " Entropy: " & _
Position(1, 0) & vbCrLf
txtAnalysis = txtAnalysis & Position(2, 1) & " Entropy: " & _
Position(2, 0) & vbCrLf
txtAnalysis = txtAnalysis & "Product Manager - Review sales to: "
txtAnalysis = txtAnalysis & Position(0, 1) & vbCrLf

Call gridTheResults

sbStatus.Panels.Item(1).Text = ""
sbStatus.Panels.Item(2).Text = "Ready."

End Sub``````

### How It Works

We will essentially determine the entropy factor on each of our three classifications and present the results to the product manager. But we want to ensure we are working with clean and valid data. In our effort to scrub the data, we want to delete any records from our `ID3` table that could have a `NULL` value in the `Category` field. Remember that we are working with legacy data here. We can never assume that it is perfect. We must be on guard to ensure the data we are working with is clean. Once we do that, we just determine how many records are left in our `ID3` table. This number, `totalSamples`, is used to see how each of the categories we are looking at did compared with all of the samples.

``````adoConnection.Execute ("DELETE * FROM ID3 WHERE Category = NULL")
'-----------------------------------------------------
'-- Determine how many records are in the ID3 Table --
'-----------------------------------------------------
Set adoTemp = New ADODB.Recordset
sSql = "SELECT count(*) as HowMany from ID3"
adoTemp.Open sSql, adoConnection
totalSamples = adoTemp!HowMany
adoTemp.Close``````

Now we have a clean `ID3` table. It only contains records where we have valid sales data for each country that actually ordered this product we are analyzing. Now that the table is clean, we are prepared to determine the entropy factor for the `CountryLanguage`, `CountryRegion` and `Country`. We have another routine, `getEntropy`, that will do the calculations for us.

``````entropyCountryLanguage = getEntropy _
("CountryLanguage", totalSamples)
Position(0, 0) = entropyCountryLanguage
Position(0, 1) = "CountryLanguage"
entropyCountryRegion = getEntropy("CountryRegion", totalSamples)
Position(1, 0) = entropyCountryRegion
Position(1, 1) = "CountryRegion"
entropyCountry = getEntropy("Country", totalSamples)
Position(2, 0) = entropyCountry
Position(2, 1) = "Country"``````

We call the `getEntropy` routine and pass it two parameters, the classification we want to get the entropy factor on and `totalSamples` - just how many records are in the recordset.

So in the first line, we call the `getEntropy` routine and pass in `"CountryLanguage"` and the `totalSamples`. We assign the value that is returned to us to `entropyCountryLanguage`. We then update the local variant array, `Position`. In the first row, first position `(0, 0)` put in the entropy value that was returned for the `CountryLanguage`. Then in the first row, second position `(0, 1)` we add the title, `"CountryLanguage"`. We then do this for the `CountryRegion` and then `Country`. Now we don't know how the values should be ranked.

In order to sort our array, I have included a modified quick sort routine that will sort the contents of our array, `Position`, from the lowest entropy factor to the highest. We simply call this routine and pass in the array with its lower and upper bounds. This routine, which we will write in a minute, will place the lowest entropy value first. As mentioned, the lower the score, the better. This means that this particular classification has the least uncertainty.

``Call qsort(Position, LBound(Position), UBound(Position))``

Once the array has been sorted, we display the results in the `txtAnalysis` text box. We also instruct the product manager to look at the value in the lowest category - whatever that might be. This is the category that is the most relevant to sales growth for that particular product.

``````txtAnalysis = "ID3 Analysis of the Category " & lstCategory & _
" and the Product " & lstProduct & vbCrLf
txtAnalysis = txtAnalysis & "The lesser the entropy, the more "
txtAnalysis = txtAnalysis & "important is this Attribute "
txtAnalysis = txtAnalysis & "to overall Sales" & vbCrLf
txtAnalysis = txtAnalysis & Position(0, 1) & " Entropy: " & _
Position(0, 0) & vbCrLf
txtAnalysis = txtAnalysis & Position(1, 1) & " Entropy: " & _
Position(1, 0) & vbCrLf
txtAnalysis = txtAnalysis & Position(2, 1) & " Entropy: " & _
Position(2, 0) & vbCrLf
txtAnalysis = txtAnalysis & "Product Manager - Review sales to: "
txtAnalysis = txtAnalysis & Position(0, 1) & vbCrLf``````

Finally, we call our generalized routine, `gridTheResults`. This will display the details of the ID3 algorithm to the user so she can visually see the relationships between the classifications.

``````Call gridTheResults

sbStatus.Panels.Item(1).Text = ""
sbStatus.Panels.Item(2).Text = "Ready."``````

Let's add the `gridTheResults` subroutine now!

#### Try It Out - Displaying Results in the resultsGrid FlexGrid

1.  Please add the following new subroutine called `gridTheResults` to the `frmID3` form.

``````Public Sub gridTheResults()

'------------------------------------------------
'-- Now let's update the grid with the regions --
'------------------------------------------------
Dim adoID3 As ADODB.Recordset
Dim sSql As String
Dim iRows As Integer
Dim iCols As Integer
Dim iRowLoop As Integer
Dim iColLoop As Integer

sSql = "SELECT UpByHowMuch, OldQuantity, NewQuantity,"
sSql = sSql & " CountryRegion, CountryLanguage, Country FROM"
sSql = sSql & " ID3 ORDER BY UpByHowMuch DESC,"
sSql = sSql & " CountryRegion, CountryLanguage"

Set adoID3 = New ADODB.Recordset
adoID3.CursorLocation = adUseClient

adoID3.Open sSql, adoConnection, , , adCmdText

adoID3.MoveFirst

iRows = adoID3.RecordCount
iCols = adoID3.Fields.Count

resultsGrid.Rows = iRows
resultsGrid.Cols = iCols
'--------------------------
'-- Set up the grid here --
'--------------------------
resultsGrid.Row = 0

For iColLoop = 0 To resultsGrid.Cols - 1
With resultsGrid
.Col = iColLoop
.ColWidth(iColLoop) = 1400
.ColAlignment(iColLoop) = 7
Select Case iColLoop
Case 0
.Text = "Growth Factor"
.MergeCol(iColLoop) = True
Case 1
.Text = "Previous Qty"
.MergeCol(iColLoop) = True
Case 2
.Text = "Recent Qty"
.MergeCol(iColLoop) = True
Case 3
.Text = "Country Region"
Case 4
.Text = "Country Language"
Case 5
.Text = "Country"
End Select
End With
Next

resultsGrid.MergeCells = flexMergeFree

For iRowLoop = 1 To iRows - 1
For iColLoop = 0 To iCols - 1
resultsGrid.Row = iRowLoop
resultsGrid.Col = iColLoop
resultsGrid.Text = adoID3.Fields(iColLoop)
Next
adoID3.MoveNext
Next

adoID3.Close

Set adoID3 = Nothing

End Sub``````

### How It Works

This code is very similar to that in the `gridID3` subroutine so I won't discuss this subroutine in much detail.

We build an SQL statement taking the `UpByHowMuch`, `OldQuantity`, `NewQuantity`, `CountryRegion`, `CountryLanguage` and `Country` fields from the `ID3` table. We then `ORDER BY` `UpByHowMuch` (in `DESC` order), `CountryRegion` and `CountryLanguage`.

``````sSql = "SELECT UpByHowMuch, OldQuantity, NewQuantity,"
sSql = sSql & " CountryRegion, CountryLanguage, Country FROM"
sSql = sSql & " ID3 ORDER BY UpByHowMuch DESC,"
sSql = sSql & " CountryRegion, CountryLanguage"``````

As with the `gridID3` code, we loop through all the rows and columns in our `adoID3` recordset. We use a `Select Case` statement to set up the headings for the columns - renaming `UpByHowMuch` as `Growth Factor`, `OldQuantity` as `Previous Qty` and `NewQuantity` as `Recent Qty`. We allow the cells for `UpByHowMuch`, `OldQuantity` and `NewQuantity` to be merged by setting their `MergeCol` properties to `True`.

``````Select Case iColLoop
Case 0
.Text = "Growth Factor"
.MergeCol(iColLoop) = True
Case 1
.Text = "Previous Qty"
.MergeCol(iColLoop) = True
Case 2
.Text = "Recent Qty"
.MergeCol(iColLoop) = True
Case 3
.Text = "Country Region"
Case 4
.Text = "Country Language"
Case 5
.Text = "Country"
End Select``````
Show:
© 2014 Microsoft