Mine Your Business

AMO Lets You Dig Deeper into Your Data from Your Own Applications

Liu Tang and Paul Bradley

This article is based on a prerelease version of SQL Server 2005. All information contained herein is subject to change.

This article discusses:
  • Business Intelligence Development Studio
  • Creating and processing mining models programmatically with AMO
  • Using AMO to manage Analysis Services objects
This article uses the following technologies:
SQL Server 2005

Contents

Benefits of AMO
What's in it for Developers?
Starting the Project
Create an Analysis Database
Define a Data Source
Define a Data Source View and Create a View Object
Build a Mining Structure
Create a Mining Model
Train the Mining Model
Secure What You Have Built
Replicate a Mining Model by Cloning
Replicate a Mining Model by Clicking
View Results
Conclusion

Remember the day you got your first big box of crayons—that colossal, multitiered package with an extraordinary number of colors? You needed every color to express your ideas more fully, down to every nuanced detail. For client-side developers, Analysis Management Objects (AMO) for SQL Server™ 2005 is a lot like that big box of crayons. Until now, lack of basic tools has left many developers unable put their own ideas into action and to answer fundamental data mining questions. Developers have found it challenging to derive and draw the vivid pictures that data mining delivers (such as market trends, key prospect profiles, and how customers define satisfaction) without bringing in extra resources.

Benefits of AMO

AMO lets developers put their own programs into action. It facilitates client-side custom programming as Decision Support Objects (DSO), the object model in SQL Server 2000 Analysis Services. With AMO, a whole range of data mining questions can be answered at the operational level. This means that sales and marketing departments get answers more quickly and thus can make informed decisions. Specialized resources, like the IT team and analysts, can be brought in when they're needed most.

AMO puts data to work on basic business problems. Are retail sales eating into catalog sales? Do inventory patterns validate the new merchandising strategy? With AMO, business questions can be answered easily and directly.

Suppose you are a marketing manager at a bicycle manufacturing firm in the upper Midwest, looking for new income streams in the winter months. You have a hunch that your customers might be good prospects for local businesses looking for leads. You want to see if the data makes a case for your idea and get the numbers you need to forecast how much income you could generate by renting the customer list to real estate agents, one of three target segments you've identified. We'll use this example later on in the article to show just how AMO works.

What's in it for Developers?

AMO provides an object model to program against and a varied object set to work with, including servers, databases, mining structures, and models, as well as online analytical processing (OLAP) objects. When you use AMO, you can directly manipulate a whole range of data mining objects in order to seamlessly embed data mining functionality into your own applications.

AMO provides developers with access to seven high-powered mining algorithms, the same algorithms that can be accessed from two GUI applications included with SQL Server 2005: the Business Intelligence Development Studio and the SQL Server Management Studio. Both development workspaces install automatically when you choose either a full setup or the client component of SQL Server. These algorithms are named as follows: Microsoft® Association Rules, Microsoft Clustering, Microsoft Decision Trees, Microsoft Naïve Bayes, Microsoft Neural Network, Microsoft Sequence Clustering, and Microsoft Time Series.

Each algorithm is designed to detect telltale patterns hidden in your data. Using them independently or in combination, you can answer a wide range of practical questions. For example, using the algorithms, you can perform a market share analysis to help evaluate the performance of a specific trading area. You can determine product affinity across a market basket of products for guidance in structuring a cross-sell campaign or to develop list selection criteria for targeted mailings.

Figure 1 Data Mining Using AMO

Figure 1** Data Mining Using AMO **

The Business Intelligence Development Studio is a working platform that lets you visualize the data mining process in a way that's comfortable whether you're a developer or database administrator. As you can see in Figure 1, you can go deep into the data with full access to the entire data mining process. You can manipulate the data directly to create mining models, view mining results, and predict missed values for new data. Building a mining model from the user interface can be achieved programmatically using AMO, and follows exactly the same procedure as in the user interface.

SQL Server Management Studio is an administrative platform that lets you leverage existing mining models and manage databases. Here, you can browse mining models and make predictions based on them, or back up and restore databases.

SQL Server 2005 comes with a practice database called AdventureWorksDW to help you find your way around AMO. AdventureWorksDW gives you access to sales data from the fictional bicycle manufacturer referred to earlier in this article.

Let's go through a step-by-step sample exercise using AdventureWorksDW. We're going to find out how many customers are homeowners and where those customers are from. To recap our example, you are a marketing manager for a bicycle manufacturer exploring the possibility of renting the customer list to local real estate agents and other businesses. You want to profile the customer base for home ownership.

Starting the Project

Using AdventureWorksDW, you'll work with data found in the DimCustomer table that is shown in Figure 2. This table summarizes demographic information, including home ownership, car ownership, marital status, number of cars owned, and annual income for more than 10,000 customers.

Figure 2 DimCustomer Table

Figure 2** DimCustomer Table **

To start writing code with AMO, all you need are references to the System.Data.dll and Microsoft.AnalysisServices.dll assemblies. By referencing the assembly Microsoft.AnalysisServices (installed as part of client component or full installation setup), you import all object definitions required by the Analysis Server. System.Data (installed as part of the Microsoft .NET Framework 2.0) is needed for data manipulations for OLE DB, data sets, and other data-related operations you may perform.

Next, so you won't need to use the namespace-qualified typename each time it's used in the code, reference a few core namespaces from these assemblies in imports directives, as shown in the following code snippet:

Imports System.Data
Imports System.Data.OleDb
Imports Microsoft.AnalysisServices

Create an Analysis Database

The analysis database contains all other objects. To create one, make sure the server is already started. Then define an analysis Server object and establish a connection to the server:

Dim srv As New Server 

srv.Connect("servername")

This code tells the database object which server the database will reside on. Multiple connection sessions are supported by the server to maintain AMO functionality. To specify a session ID, pass an ID string to the Connect method as the second parameter (omitted in this example for simplicity).

Next, a database object can be initialized and added to a collection of databases, which are connected to the server. During the initialization, the system will assign a unique database ID through the Utils class, an AMO class that manages the binding of object IDs and object names and frees developers from having to manage object IDs. The mapping of database names and IDs is automatic. Figure 3 shows how to initialize a Database object and add it to the Server object. The Utils class can also be used to serialize and deserialize the major server objects (database, data source, mining structure, mining model and others), though only a few are included in this exercise.

Figure 3 Creating a Database Object

Function CreateDatabase( _
        ByVal srv As Server, ByVal databaseName As String) As Database

    ' Check to see if the database already exists
    If srv.Databases.ContainsName(databaseName) Then
       Throw New ArgumentOutOfRangeException("databaseName", _
           "Database already exists in the server.")
    End If
    
    ' Create the new database object
    Dim dbNew As Database = New Database(databaseName, _
        Utils.GetSyntacticallyValidID(databaseName, GetType(Database)))

     ' Add the new database object to the server's collection
    srv.Databases.Add(dbNew)
    dbNew.Update()

    ' Return a reference to the database
    Return dbNew
End Function

As elsewhere, unique names are important. Before adding a new database object to the server, make sure no existing database has the same name, or the old one will be overwritten.

The newly created database object must be added to the database collection on the server. The server objects are organized in the form of a hierarchical tree, as shown in Figure 4. To add a new object, link the isolated (child) node to its corresponding parent node, calling the Add method. Child nodes are referred to as a collection of their node type or server object type in AMO.

Figure 4 Server Object Hierarchy

Figure 4** Server Object Hierarchy **

Multiple database objects are stored on the server object and each database can contain multiple data source objects, as well as other objects, at different hierarchical levels. There are many object collections in AMO, only some of which are used in our examples (identified in the example code by an Add method). For major server objects, changes made through AMO won't materialize on the server until you call the appropriate Update method.

Define a Data Source

A data source tells the program where the data comes from (AdventureWorksDW, in this case) and establishes a connection to a source database. Figure 5 builds a data source object.

Figure 5 Building and Naming a Data Source Object

Function CreateDatasource( _ 
        ByVal db As Database, ByVal datasourceName As String, _
        ByVal connectionString As String) As DataSource

    ' Create a new data source
    Dim dsNew As RelationalDataSource = _
        New RelationalDataSource(datasourceName, _
            Utils.GetSyntacticallyValidID(datasourceName, _
                GetType(RelationalDataSource)))

    ' Add it to the database's data sources
    db.DataSources.Add(dsNew)
    dsNew.ConnectionString = "SQLOLEDB.1;Data Source=sourcename;" & _
        "Integrated Security=SSPI;Initial Catalog=AdventureWorksDW"
    dsNew.Update()

    ' Return the new data source
    Return dsNew
End Function

Two types of data source objects, RelationalDataSource and OlapDataSource, are used to connect to databases. We create a RelationalDataSource object when connecting to a relational database. As was the case for the database object in Figure 3, the mapping of data source names and IDs is provided by AMO.

Define a Data Source View and Create a View Object

Data source view functions map between the database data and your view of the data. In a data source view, you can select only those database tables needed for the model, or create new tables by joining or cutting, which is performed by a named query in a GUI. In the data source view you can further refine data before initiating the actual mining process. (Note that the sample code in this article shows only the basic functionality for defining a data source view. There are many more possibilities in AMO.)

Now that you have both a database object and a data source object, you can create a DataSourceView object—a slightly more complex task than those completed so far. As before, a data source ID must be established. In addition, the data source view must be populated with the schema information from the table.

As shown in Figure 6, you first create an instance of DataSourceView and add it to the database, setting its DataSourceID to match the ID of the relevant DataSource. Next, you load the schema information from the selected table into the data source view. The schema information is then inserted into the DataSourceView.

Figure 6 Creating a Data Source View

Function CreateDatasourceView( _
        ByVal db As Database, ByVal ds As DataSource, _
        ByVal datasourceviewName As String, ByVal tableName As String) _
        As DataSourceView
        
    Dim rdsv As DataSourceView = 
        db.DataSourceViews.Add(datasourceviewName, _
            Utils.GetSyntacticallyValidID(datasourceviewName, _
                GetType(DataSourceView)))
    rdsv.DataSourceID = ds.ID

    Dim cn As OleDbConnection = New OleDbConnection(ds.ConnectionString)
    Dim cmd As OleDbCommand = New OleDbCommand(tableName, cn)
    cmd.CommandType = CommandType.TableDirect;

    Dim dss As New DataSet
    Dim ad As OleDbDataAdapter = New OleDbDataAdapter(cmd)
    ad.FillSchema(dss, SchemaType.Source)

    ' Make sure we have the correct name
    dss.Tables(0).TableName = tableName

    ' We need to clone here because the original DataTable already 
    ' belongs to a DataSet
    rdsv.Schema = New DataSet
    rdsv.Schema.Tables.Add(dss.Tables(tableName).Clone())
    rdsv.Update()

    Return rdsv
End Function

Build a Mining Structure

A mining structure organizes mining data, defining the data columns that one or more mining models will draw from. Because algorithms uncover specific patterns or relationships, multiple models are often required to achieve a complete solution to any one problem. This allows you to solve the problem in different ways and, by comparing results, find the best solution.

It makes sense, then, to think of a mining structure as an entity that contains multiple mining models (all of which share the same column structure and column content type), detailed information about the nature of the data the model will use, and information about the relationships between or among columns.

Remember, we want to look at home ownership patterns for the customer base as a whole and for segments of the whole. This means we need to build a mining structure with NumberCarsOwned, MaritalStatus, YearlyIncome, and HouseOwnerFlag columns, plus a CustomerKey column.An Overview of ADOMD

To facilitate programming against SQL Server 2005 data mining, Analysis Services supports two objects models in addition to the AMO object model: ADOMD Server and ADOMD .NET. Like AMO, they are object models for both data mining and OLAP.

ADOMD Server is a server-side object model residing inside the Analysis Services server. With it, you can send queries to the server without having to set up a connection with the server, which makes it easy to navigate the server objects and retrieve information from them directly. You get all the data mining objects you need, from mining model column attributes to mining model contents. Plus, you don't need to flush big mining model contents into the client side before doing complicated queries and computations. ADOMD Server isolates server tasks from client tasks gracefully, letting the server do its job and reserving the complex manipulation of queries for the client-side programmer.

ADOMD .NET is a client-side query tool that complements ADOMD Server. It has nearly the same set of objects and object hierarchy as ADOMD Server, the main difference being that programming with ADOMD .NET requires a connection to the server before sending queries or navigating the server objects. All the query results are dumped to the client when the server responds.

The code that follows shows how to initialize a mining structure object, bind it to its data source view, and set its case table as DimCustomer. For the purpose of binding, the CaseTableName has to be the same as the table name in data source viewer.

Dim ms As MiningStructure = db.MiningStructures.Add(miningstructureName, _
    Utils.GetSyntacticallyValidID(miningstructureName, _
        GetType(MiningStructure)))
ms.Source = New DataSourceViewBinding(dsv.ID)
ms.CaseTableName = "DimCustomer"   'binding to data source table name

When you create a data column, you must set IsKey and Content properties. They determine how the data will be interpreted by the data mining engine during the model training. IsKey specifies that the column will be treated as a key column. Content specifies the content type: DISCRETE (of distinct categorical value applicable to all data) or CONTINUOUS (of continuous value applicable only to numeric data). Numeric data can be either DISCRETE or CONTINUOUS. For example, you could have an Age column where 1 means "less than 25" or where 1 means the value 1. After setting IsKey and Content, the data column is added to a collection of mining structure columns.

The code in Figure 7 shows how columns are created and added to the mining structure called ms. There are two kinds of mining structure columns, ScalarMiningStructureColumn and TableMiningStructureColumn. ScalarMiningStructureColumn is used for all regular, non-table data columns, and TableMiningStructureColumn is used for creating nested tables. In this example, all the columns are non-table, so all are initialized as ScalarMiningStructureColumn objects and added to the mining structure in the same way. Finally, calling the Update method will ensure that the columns are actually created in the mining structure on the server.

Figure 7 Creating and Adding Columns

Function CreateMiningStructure( _
        ByVal db As Database, ByVal dsv As DataSourceView, _
        ByVal miningstructureName As String) As MiningStructure

    ' Create the mining structure
    Dim ms As MiningStructure = db.MiningStructures.Add( _
        miningstructureName, _
        Utils.GetSyntacticallyValidID(miningstructureName, _
            GetType(MiningStructure)))
    ms.Source = New DataSourceViewBinding(dsv.ID)
    ms.CaseTableName = "DimCustomer" ' binding to data source table name

    ' CustomerKey
    Dim smsc As ScalarMiningStructureColumn = _
        New ScalarMiningStructureColumn("CustomerKey", _    
            Utils.GetSyntacticallyValidID("CustomerKey", _
                GetType(ScalarMiningStructureColumn)))
    smsc.IsKey = True
    smsc.Content = "Key"
    smsc.KeyColumns.Add("DimCustomer", "CustomerKey", OleDbType.Integer)
    ms.Columns.Add(smsc)

    ' HouseOwnerFlag
    smsc = New ScalarMiningStructureColumn("HouseOwnerFlag", _   
        Utils.GetSyntacticallyValidID("HouseOwnerFlag", _
            GetType(ScalarMiningStructureColumn)))
    smsc.IsKey = False
    smsc.Content = "Discrete"
    smsc.KeyColumns.Add("DimCustomer", "HouseOwnerFlag", OleDbType.WChar)
    ms.Columns.Add(smsc)

    ... 'for MaritalStatus and NumberCarsOwned

    ' YearlyIncome
    smsc = New ScalarMiningStructureColumn("YearlyIncome", _
        Utils.GetSyntacticallyValidID("YearlyIncome", _
            GetType(ScalarMiningStructureColumn)))
    smsc.IsKey = False
    smsc.Content = "Continuous"
    smsc.KeyColumns.Add("DimCustomer", "YearlyIncome", OleDbType.Integer)
    ms.Columns.Add(smsc)
      
    ms.Update()

    Return ms
End Function

Create a Mining Model

As noted, a mining model is contained inside a mining structure, which may have more than one model, each using different algorithms and parameter settings. We will first initialize a mining model object and add it to a mining structure, then specify the algorithm and parameter. Because each algorithm can have multiple algorithm parameters, AlgorithmParameters is a hash table collection that maps each parameter name with its corresponding value. Parameter settings are algorithm-specific. That is, each algorithm has its own set of parameters, which affects how patterns are generated by the algorithm. Details of parameter settings can be found in SQL Server Books Online. The following code is what we have so far:

Dim mm As MiningModel = ms.MiningModels.Add(miningmodelName, _
    Utils.GetSyntacticallyValidID(miningmodelName, GetType(MiningModel)))
mm.Algorithm = "Microsoft_Decision_Trees"
mm.AlgorithmParameters.Add("COMPLEXITY_PENALTY", 0.3)

Each model column needs to bind with its corresponding mining structure column, by column ID. The binding is shown in Figure 8, in the second line of each model column definition.

Figure 8 Creating the Mining Model Columns

Function CreateMiningModel( _
        ByVal ms As MiningStructure, ByVal miningmodelName As String) _
        As MiningModel
   
    Dim mm As MiningModel = ms.MiningModels.Add(miningmodelName, _
       Utils.GetSyntacticallyValidID(miningmodelName, 
           GetType(MiningModel)))
   mm.Algorithm = "Microsoft_Decision_Trees"
    mm.AlgorithmParameters.Add("COMPLEXITY_PENALTY", 0.3)

    Dim mc As MiningModelColumn  
    mc = New MiningModelColumn("CustomerKey", _ 
        Utils.GetSyntacticallyValidID("CustomerKey", _
            GetType(MiningModelColumn)))
    mc.SourceColumnID = ms.Columns("CustomerKey").ID
    mc.Usage = "Key"
    mm.Columns.Add(mc)

    mc = New MiningModelColumn("HouseOwnerFlag", _
        Utils.GetSyntacticallyValidID("HouseOwnerFlag", _ 
            GetType(MiningModelColumn)))
    mc.SourceColumnID = ms.Columns("HouseOwnerFlag").ID
    mc.Usage = "PredictOnly"
    mm.Columns.Add(mc)

    mc = New MiningModelColumn("MaritalStatus", _ 
        Utils.GetSyntacticallyValidID("MaritalStatus", _ 
            GetType(MiningModelColumn)))
    mc.SourceColumnID = ms.Columns("MaritalStatus").ID
    mc.Usage = "Input"
    mm.Columns.Add(mc)

    mc = New MiningModelColumn("NumberCarsOwned", _
        Utils.GetSyntacticallyValidID("NumberCarsOwned", _ 
            GetType(MiningModelColumn)))
    mc.SourceColumnID = ms.Columns("NumberCarsOwned").ID
    mc.Usage = "Input"
    mm.Columns.Add(mc)

    mc = New MiningModelColumn("YearlyIncome", _ 
        Utils.GetSyntacticallyValidID("YearlyIncome", _
            GetType(MiningModelColumn)))
    mc.SourceColumnID = ms.Columns("YearlyIncome").ID
    mc.Usage = "Input"
    mm.Columns.Add(mc)

    mm.Update()

    Return mm
End Function

Data mining columns have several property types, the most important of which are data type, content type, and usage. Look back to Figure 7 and you'll see that you've already completed the first level of defining the data and content type of the columns. Now, you need to further define the usage of the columns.

In data mining with Analysis Services, there are four types of column usage: input, predict, predict only, or key. Only one key column can be set; it's a unique identifier for the rows feeding into the processing. The last update statement shown in Figure 8 sends the column metadata to the server.

Train the Mining Model

All necessary definitions for the mining model are complete. Before it can be of any use, though, you have to train the model (or process it) to find useful patterns in the data. Processing happens inside the server. By way of a complex computation that is dictated and run by the algorithm, data is pulled from the data source provided. Once processing is triggered, you simply wait for the results. The time needed for processing depends on the amount of data involved and the complexity of the problem. Before processing, the model has only metadata defined. After processing, the patterns that are found are stored in the model.

There are a number of process options, including ProcessFull, ProcessStructure, and ProcessDefault. Suppose you have a mining structure with two mining models. You can process both models in one batch by fully processing the mining structure and mining models together. Or you can first process just the mining structure, then the models individually. ProcessStructure forms metadata by collecting data from the source, then prepares the data according to its content type for future model use. The following code shows the second choice, but with only one model:

'Process mining structure first
ms.Process(ProcessType.ProcessStructure)

'Process the mining model only
mm.Process(ProcessType.ProcessDefault)

The ProcessDefault option guarantees that after running, any object will be fully processed. So we could have written ProcessDefault without first processing the structure.

AMO can also clear the data from the mining structure and mining model through ProcessClearStructureOnly and ProcessClear. After calling ProcessClearStructureOnly, the metadata stored in the mining structure is destroyed so that no new models can be added to the mining structure. However, existing trained mining models won't be affected: the patterns are preserved and you can still make predictions based on them. The ProcessClear option will clear everything from mining structure to mining model.

Secure What You Have Built

AMO offers three levels of access privilege to protect mining structures and mining models from being viewed or modified by unauthorized persons. They include None (lowest), Read, and ReadWrite (highest).

The security in SQL Server 2005 data mining is organized by roles. Role objects can be defined at either the server or database level. Access privilege at the server level applies to all server objects. At the database level, only objects confined to the database are restricted. All access privileges are bound to roles and can be applied to database objects and to many OLAP objects, not only to mining structures and mining models.

Figure 9 shows how to set access privileges for the mining structure and mining model with the role defined at the database level. Here, users of the myrole group cannot change the mining structure, but may modify the node caption of the mining model and make predictions based on the model.

Figure 9 Setting Access Privileges

' Create new role;
Dim role As Role = db.Roles.Add("myrole"); 
role.Members.Add("someone");
role.Update();

Dim msp As MiningStructurePermission
Dim mmp As MiningModelPermission

msp = ms.MiningStructurePermissions.Add(role.ID);
msp.Access = Access.Read;
msp.Update();

mmp = mm.MiningModelPermissions.Add(role.ID);
mmp.Access = Access.ReadWrite;
mmp.Update();

Replicate a Mining Model by Cloning

In the Business Intelligence Development Studio, only a few mouse clicks are required to create a new mining model based on an existing one. This can also be achieved programmatically by cloning, since the new model shares the same mining structure as the old. Figure 10 shows how.

Figure 10 Cloning

Function AddNewMiningModelBasedOnExistingModel( _
        ByVal existingModel As MiningModel, _
        ByVal newMiningModelName As String, _
        ByVal newAlgorithmName As String) As MiningModel

    Dim derivedModel As MiningModel

    If existingModel Is Nothing Then
        Throw New ArgmentNullException("existingModel")
    Else
        derivedModel = existingModel.Clone()
        If existingModel.Algorithm <> newAlgorithmName Then
            derivedModel.AlgorithmParameters.Clear()
        End If
    End If

    derivedModel.Name = newMiningModelName
    derivedModel.ID = _ 
        existingModel.Parent.MiningModels.GetNewID(newMiningModelName)
    derivedModel.Algorithm = newAlgorithmName

    existingModel.Parent.MiningModels.Add(derivedModel)

    derivedModel.Update()

    Return derivedModel
End Function

In Figure 10, the MiningModel.Clone method is called to generate a derived model that is an exact copy of the existingModel (note that the derivedModel will not reside on the mining structure until Update is called). If the derivedModel uses a different algorithm, calling AlgorithmParameters.Clear is required in order to avoid incompatibility between the derivedModel algorithm and the parameter settings stored in existingModel.

To add derivedModel to the mining structure, an ID has to be assigned over and above the model and algorithm names. This is done using the MiningModels.GetNewID method. Be aware that the Parent attribute for existingModel is the mining structure shared by both existingModel and derivedModel. The derivedModel is registered with the mining structure once existingModel.MiningModels.Add(derivedModel) is executed

The Parent attribute shows the hierarchy of server objects. Each object corresponds to a node in the server object's tree. The attribute Parent gets the direct parent node of the current object. For programming convenience, cross-level parent attributes are available. A mining model might include an attribute pointing to its mining structure, as well as Database and Server attributes, in a format like ParentMiningStructure, ParentDatabase, and so on.

Replicate a Mining Model by Clicking

If you've tried out SQL Server 2005 Business Intelligence Development Studio, you know you can create a new mining model directly from an existing mining structure. Figure 11 shows what the AMO code supporting this function looks like. When you replicate this way as opposed to the method shown in Figure 10, the key difference is that MiningStructure.CreateMiningModel is called from the mining structure existingStructure, which creates a new mining model composed of all the data columns from existingStructure. Passing in the argument false indicates that the model shouldn't be added to the model collection.

Figure 11 Deriving from an Existing Mining Structure

Function AddNewMiningModelBasedOnExistingStructure( _
        ByVal existingStructure As MiningStructure, _
        ByVal newMiningModelName As String, _
        ByVal newAlgorithmName As String) As MiningModel

    Dim derivedModel As MiningModel

    If existingStructure Is Nothing Then
        Throw New ArgumentNullException("existingStructure")
    Else
        derivedModel = existingStructure.CreateMiningModel(False)
    End If

    derivedModel.Name = newMiningModelName
    derivedModel.ID = _ 
        existingStructure.MiningModels.GetNewID(newMiningModelName)
    derivedModel.Algorithm = newAlgorithmName

    existingStructure.MiningModels.Add(derivedModel)

    derivedModel.Update()

    Return derivedModel
End Function

There is a difference between deriving models from existing mining structures and deriving models from existing mining models. The model derived from an existing structure has the same number of data columns as the structure; the one derived from the existing model has the same number of data columns as the model, which could be less than the structure.

View Results

After processing the mining model, you can view the outcome manually by going to SQL Server Management Studio. Right-click on the model in question and choose Browse from the context menu. You will see a decision tree of HouseOwnerFlag in Figure 12. The complete tree has six levels, though Figure 12 shows only three levels for the sake of brevity. Yellow represents customers who own a home (HouseOwnerFlag = 1), while red represents customers who rent or live with family (HouseOwnerFlag = 0).

Figure 12 Decision Tree

Figure 12 Decision Tree

Trained model result viewers, like the one shown in Figure 12, are redistributable, meaning you can view results by embedding them in your own custom client applications. In order to do so, simply write a program in ADOMD (either ADOMD Server or ADOMD .NET, depending on your access), where you can browse the results and perform predictions.

ADOMD is a query tool, perfect for manipulating the trained model results. In addition to directly sending queries to the server, ADOMD exposes the details of model attributes and content information piece by piece. It gives you an in-depth view of a whole range of data, including model column type, model content node, content distribution (nested in each row of content node), mining services, algorithm parameters, and much more. Without ADOMD, getting this information requires a fair bit of choreography, including querying the server through the OLE DB command, retrieving each data element one by one by splitting the data set, as well as some more complicated steps. For more information, see the sidebar "An Overview of ADOMD" and go to www.sqlserverdatamining.com, where you'll find a downloadable sample to integrate the viewer into client applications.

Whichever way you view the results, the marketing team will be happy to know that most customers on record are homeowners, including roughly half of the unmarried customers and half of the customers at nearly all income levels. The small number of homeowners with incomes above $122,200 is an unexpected finding worth looking into down the road. In the meantime, it looks like AdventureWorksDW has a sizeable list of customers that local real estate agents would be willing to rent.

Conclusion

AMO provides a complete set of objects for data mining and a great toolkit for developers who want to build, process, and manage data mining models. Fully compatible with the .NET environment, AMO lets developers embed data mining into their own applications easily and smoothly.

With SQL Server 2005 and AMO you can integrate business intelligence into applications directly and manage Analysis Services objects programmatically. Whatever your industry, AMO for data mining can add more definition to research capabilities.

Liu Tang is a Software Design Engineer/Test for the Data Mining team in SQL Server at Microsoft, where she has worked on the designing and testing of SQL Server Data Mining features for the past three years. She has a Master's degree from Simon Fraser University, Canada, majoring in data mining and databases.

Paul Bradley, Ph.D., is a Principal in Data Mining Technology at Apollo Data Technologies, LLC.