Article
06/29/2015

January 2014

Volume 29 Number 1

SQL Server : Unit Testing SQL Server OLAP Cubes Using C#

I feel a little like Thomas Jefferson when I say, “We hold these truths to be self-evident, that all code has certain unalienable rights and that among these rights is the ability to be thoroughly unit tested in a simple and concise way so bugs can be easily identified and production outages minimized.” A bit dramatic, yes, but it gets my point across.

Of course, most developers believe their code should be unit tested, but what happens when the definition of “code” is blurred? This was the situation I found myself in recently when presented with a complicated issue and tasked to find a solution. A group of developers was involved with writing a complex online analytical processing (OLAP) cube using SQL Server Analysis Services (SSAS). This cube had numerous dimensions all tied to an extremely complex fact table. Because the developers were quite skilled at developing cubes, they were able to piece the cube together, but validating the results of their Multidimensional Expressions (MDX) queries was a daunting task. The difficulty was compounded by a number of factors, including the amount of data in the dimensions and fact table, as well as the time and computing resources needed to build the cube. Once the cube was built, the results (produced by MDX queries) were sent to the users. If the users found an issue with the data, it would take a long time to track the problem down. Also, once the underlying problem was discovered and fixed, the cube would need to be regenerated. To make matters worse, if dimensions were added, the underlying fact table was updated or the cube was built using different aggregations, there was no way to determine the full effects of these changes. A seemingly innocent change could have a far-reaching and cascading effect on queries to the cube.

The solution for the cube conundrum is to create an environment and a process wherein you can stage the cube’s dimensions and facts using a limited amount of data, and execute queries against the cube using the production version of the cube’s schema. Ideally the test version of the cube would be created from scratch each time the unit tests run, eliminating the chance for side effects occurring from pre-existing artifacts. Other requirements for the unit-test framework (and really these apply to most unit-test frameworks regardless of the target application) are to ensure the test validations are repeatable, and if failure occurs to quickly and easily pinpoint why the test failed (having the test case say “failed because data did not match” is less than ideal).

In this article, I’ll present a framework that lets you create a suite of unit tests to validate the MDX-based output of an OLAP cube. The architecture described enables the creation of a cube using an existing schema, within its own unit-test database, recreated each time the test suite is run. It also lets you provide MDX queries that are executed against the newly formed unit-test version of the cube. Further, it validates the results against a pre-existing template and presents them in simple HTML format. This pattern of test case validation against a template can be extended to unit-testing frameworks in which the data results are large and complex.

Cube Overview

Before delving into the solution for the cube-testing issue, I’ll briefly go over the concepts and components that comprise a cube. Cubes are a means of quickly accessing data held within a data warehouse. Cubes organize and summarize the data into a multidimensional structure. Cubes are the main component of OLAP technology, and they provide an easy-to-use mechanism for querying data with quick and predictable response times. A cube is made up of the dimension data and measures (numeric facts). The central table in a cube is known as the fact table, and it’s the source of the cube’s measures. The dimension tables are referenced by the fact table, and they contain hierarchical levels of information that can be queried. The dimension hierarchy lets users ask questions at a high level. Then, using the dimension’s hierarchy, users can get to more details.

Cubes are contained within a database. The objects that make up the cube structure for a given database are as follows:

Data sources: These are the sources of the information to be loaded into the cube.
Measures: These are the numeric values represented in the cube. They can be dates, but are usually numeric with differing levels of aggregation (such as Sum, Max, Min and Count).
Dimensions: These are attributes associated with measures. Business data, customer names and geographic regions are common examples of dimensions.
Partitions: A partition defines a portion of the fact data loaded into a measure group. By creating multiple partitions, the cube can be processed in parallel and stored and queried separately, thereby improving performance. You can also reprocess individual partitions without affecting other partitions.
Cube roles: Each cube should have at least one cube role to allow access to end users. Roles can allow access to all data or a subset of the data stored within the cube based on an individual user ID or an Active Directory group.

The definition for a cube’s schema can be extracted from SSAS in the form of XML for Analysis (XMLA). XMLA is a SOAP-based XML protocol that gives access to the cube over HTTP. The XMLA definition contains all the details for each of the five cube objects described earlier. XMLA lets you recreate the cube on different databases or servers quickly and easily. It’s the cornerstone by which the unit-testable cube is created.

Once a cube is created and processed, the data measures can be queried using a mix and match of the variety of dimensions used to create it. Cubes are queried with the aforementioned MDX syntax. Like a SQL query, an MDX query contains a data request (using the SELECT statement), a data point (using the FROM statement) and an optional data filter (using the WHERE clause). Here’s a basic example:

SELECT {[<axis specification using Measures>]...} ON AXIS(0),
       {[<axis specification using a dimension hierarchy>]...} ON AXIS(1)
FROM [<cube>]
WHERE (<filter specification>)

The Example Sales Cube

I created an example cube that lets you quickly query sales data for various stores on various dates by various customers (see Figure 1).

Figure 1 Example Sales Cube Database Diagram

The cube is made up of four dimension tables connecting to a fact table. For simplicity, the fact table contains one measure called Quantity, which represents the amount of a given item sold to a given customer at a store location on a given purchase date. With this cube configuration, the user can quickly query for various facts using various dimensions. For example, corporate executives can get a clear idea of which products are selling at which stores, or they can delve into details such as which items sell best during a particular year or month. Having a large number of dimensions lets an executive view sales data in a variety of ways, providing better insight into the performance of stores.

Creating the Unit-Test Version of the Cube

As mentioned, the cube definition can be extracted from SSAS in the form of an XMLA document. This XMLA document is used to create the unit-test version of the cube. Using the production definition of the cube ensures the tests will accurately exercise all features of the cube in question. You interface with the SSAS engine using Microsoft.AnalysisServices.dll, which can be found in the SQL Server SDK assemblies.

The object used to generate the cube is XMLAtoCube. The constructor takes in a base configuration directory under which the XMLA is stored. The directory structure I chose to house the cube XMLA is <base directory>\<production server name>\<production database name>.

XMLAtoCube contains one public method, CreateCubeFromXMLA, which takes the following parameters (see the method’s code in Listing 1 in the Listings.zip file in the accompanying code download):

SourceServerName: The name of the production server that contains the cube.
TargetServerName: The name of the server that will house the unit-test version of the cube.
SourceDatabaseName: The name of the production database that contains the cube.
TargetDatabaseName: The name of the database that will house the unit-test version of the cube.
DataSourceProviderDefinition: The connection string URL that points the unit-test version of the cube to the location of its source dimensions and fact table. This will be the source of the scaled-down data for the unit test.

CreateCubeFromXMLA first establishes a connection to the version of the unit-test analytics server and drops the unit-test version of the cube database if it already exists. The dropping of the database is important because it ensures the tests are conducted on a clean environment without any residual artifacts contaminating the outcome. The connection to the analytics server is executed within the ConnectToServer method using an instance of Microsoft.AnalysisServices.Server. A call to Server.Connect(<Connection String>) using the unit-test server name passed in establishes the connection (see Listing 2 in the code download). Once the server connection is successfully established, the unit-test version of the cube database is removed if it already exists. This removal is done with the DropDatabase method, which is passed the connected Server instance and the name of the unit-test database to drop (TargetServerName). DropDatabase ensures that the server instance is connected, looks up the Microsoft.AnalysisServices.Database using the unit-test database name passed in and, if the database exists (the database instance is not null), the database is dropped (see Listing 3 in the code download).

The next step is to take the original cube’s XMLA definition and generate the unit-test version. The unit-test version contains the same dimensions, dimension hierarchies, measures, partitions, roles and so on as the original cube. The difference is that it’s generated in a new database pointing to a different location for its source data. The AddCubeToDatabase method creates the test cube (see Listing 4 in the code download). AddCubeToDatabase reads the XMLA definition from the file system using the naming convention mentioned earlier. The XMLA file name is passed to an instance of XmlTextReader at construction. The XMLA is read in using the Microsoft.AnalysisServices.Utils.Deserialize method, which is passed the XmlTextReader and a newly created instance of Microsoft.AnalysisServices.Database. The Database object instance now contains the complete definition of the cube, but that definition still has the original cube’s database name and data source. Pointing to the unit-test database simply involves setting the Name and ID properties of the “Database to unit-test database name” (targetDatabaseName) parameter. This database can then be added to the instance of the unit-test analytics server by calling Server.Databases.Add(<unit test database>) followed by the Database.Update method.

After the database has been created, you need to update the cube’s data source to the external unit-test data collection. The Database instance has a list of DataSource instances (usually just one data source is associated with a cube) and the unit-test connection string is used to replace the connection string contained within the XMLA definition. After the connection string is replaced, a call to the DataSource.Update method updates it within the SSAS server. At this point, the customization of the XMLA definition is completed and the remaining pieces of the cube (DataSourceView, Dimension and Cube) are updated and processed.

After the call to AddCubeToDatabase is complete, the unit-test version of the cube has been created within the specified server using the chosen unit-test database. It points to a custom set of source data. Although the cube has been created, it’s still empty. In order to populate the dimensions with the source data, the dimensions must be processed. The ProcessDimensions method is called and passed the Database instance. All the Dimensions within the Database are processed using the Dimension.Process(ProcessType.ProcessFull) method. Once the dimensions have been successfully processed, the cubes within the unit-test database are processed using the ProcessCube method. Like ProcessDimensions, ProcessCube takes the Database instance, loops through all the cubes in the Database and calls Cube.Process(ProcessType.ProcessFull) (see Listing 5 in the code download). At this point, the unit-test cube has been created and populated with the targeted test data. The next step is to run tests against it to ensure that the cube behaves as expected.

Validating the Cube

Although the solution for validation is targeted for the cube, the design pattern being used can apply to other unit-testing frameworks as well. The pattern employed is testing using template validation. Simply put: When the test runs, a data structure containing the results is created and validated against a previously stored version of the data structure that was deemed correct. Because it’s hard to validate a binary data structure, an HTML representation of the structure is presented to the unit tester. The tester uses the HTML representation to validate the initial test results of the unit test (to make sure the results match what is expected) and to examine what caused a unit test to fail during subsequent runs. In a failure, the HTML displays the original data structure as well as which piece of the data structure failed validation and why. This is critical to helping debug the issue.

The best way to test most scenarios is using the “black-box” testing methodology. A black-box test passes input and validates output. Because the output for a cube is a query result, the input is the query. You use MDX statements to query a cube. After the cube interprets the MDX query, it returns a result. The result is in the form of rows and columns that are a representation of the dimensions chosen for running the MDX query. The MDX query result can be quite complex because the query can involve many dimensions, resulting in a variety of rows and columns in the output. The data structure chosen to hold the cube data is a Dictionary of CubeRow instances keyed by the name of the cube row. The CubeRow class contains two alternative data structures—the one used depends on the situation. One data structure is a Dictionary of CubeTuple instances keyed by the name of the cube column. The CubeTuple is simply an object that contains the cube’s column name and the given column’s value. The Dictionary of CubeTuble objects is used when the given cube column contains a value. The second data structure is another Dictionary, mapping a row name to a CubeRow instance. Because an MDX query can have many levels of row dimensions, CubeRow contains its own Dictionary of row name and CubeRow instances.

Not only are the results of the cube stored in a Dictionary<String, CubeRow> instance, the results are also stored in an HTML string. The HTML string allows the tester to have a visual representation of the cube results. The HTML table contains multiple levels of column and row headers, and the HTML table cells contain the MDX values. Figure 2 shows the layout of the HTML representation of an MDX query result.

Figure 2 The Layout of the HTML Representation of an MDX Query Result

		Column Dimension Caption 1 (Value 1)	Column Dimension Caption 1 (Value 1)	Column Dimension Caption 1 (Value 2)	Column Dimension Caption 1 (Value 2)
		Column Dimension Caption 2 (Value 1)	Column Dimension Caption 2 (Value 2)	Column Dimension Caption 2 (Value 1)	Column Dimension Caption 2 (Value 2)
Row Dimension Caption 1 (Value 1)	Row Dimension Caption 2 (Value 1)	MDX Result Value	MDX Result Value	MDX Result Value	MDX Result Value
Row Dimension Caption 1 (Value 1)	Row Dimension Caption 2 (Value 2)	MDX Result Value	MDX Result Value	MDX Result Value	MDX Result Value
Row Dimension Caption 1 (Value 2)	Row Dimension Caption 2 (Value 1)	MDX Result Value	MDX Result Value	MDX Result Value	MDX Result Value
Row Dimension Caption 1 (Value 2)	Row Dimension Caption 2 (Value 2)	MDX Result Value	MDX Result Value	MDX Result Value	MDX Result Value

BaseMDXTest contains the code for executing an MDX query and building the MDX result data structure as well as the representative XML. BaseMDXTest uses Microsoft.AnalysisServices.AdomdClient.dll, found in the SQL Server SDK assemblies, to connect to the SSAS cube and execute the MDX queries. BuildTemplate is the method that runs the MDX query and builds the MDX result Dictionary as well as the HTML representation. First, a connection to the cube is established. In order to establish and open a connection, the connection string is passed to an instance of MicrosoftAnalysisServices.AdomdClient.AdomdConnection. The Open method is then called on the newly created connection instance and the connection instance is returned to the caller. Once a connection is created, an instance of the MicrosoftAnalysisServices.AdomdClient.AdomdCommand method is established and is passed the MDX query string and the AdomdConnection instance. A call to the AdomdCommand.ExecuteCellSet method executes the MDX query against the unit-test version of the cube and returns a MicrosoftAnalysisServices.AdomdClient.CellSet instance (see Listing 6 in the code download).

After the CellSet instance is retrieved, a check is made to ensure that the result set has two axes. Axis 0 contains the columns and Axis 1 contains the rows. Each axis contains a collection of Position objects. A Position represents a tuple on the given axis and contains one or more Member objects. A Member represents the column or row headers for the given Position (see Listing 7 in the code download).

Next, the number of rows and columns returned by the MDX query is computed. This is done by taking the number of rows (the count of Position objects on Axis 1) plus the number of column dimensions (CellSet.Axes[0].Positions[0].Members.Count). The number of columns is added to the rows because when representing the MDX results as a two-dimensional table, the columns are included within the set of rows. Likewise, the number of columns is computed by taking the number of Position objects on Axis 0 plus the number of row dimensions (see Listing 8 in the code download).

Given the number of rows and columns, the Dictionary of CubeRow objects can be generated as well as the HTML representation of the MDX output. Listing 9 in the code download contains the code for traversing the MDX result set, creating the HTML and storing the results in the CubeRow Dictionary. First, the number of rows is looped through. If the row number is greater than the number of column dimensions, then it’s known that a new row of MDX results is available. In other words, when the rows of column headers have been passed, the MDX data is available. At this point, a new CubeRow object is created.

The next step is to loop through each column within the row. If the current row number is lower than the number of column dimensions, then the current row is actually a row of column headers. For the Dictionary, the header captions are concatenated for each column location, separated by a colon. This means that if a header is made up of multiple dimensions, each column will be concatenated with the given dimension in descending order of resolution. The HTML generation for a column header is more straightforward. The dimension caption is simply surrounded by the HTML table header tags (<th></th>). For each case, the current dimension caption is retrieved by getting the header axis (CellSet.Axis[0]) and accessing the column position (current column count minus the current row dimension count) and the current Member within that Position (current row count).

If the current row number is greater than the number of column dimensions, then the column headers are no longer being processed. Instead, the MDX result set row tuples are next in line for processing. Similar to columns, which have headers, MDX result set rows may also have headers. If the column number being processed is less than the number of row dimensions, then a row header is being processed. For each row header, a new Dictionary<string, CubeRow> is created, added to the current Dictionary and set as the current MDX result Dictionary. What this means is that for each row header, there exists a Dictionary of rows that contains more granulated MDX result data.

Extracting the row header caption is similar to extracting the column header caption. The row caption is retrieved from the Member object at the current column location from the Position at the current row from CellSet.Axis[1]. The row caption is the key to the Dictionary of CubeRows, and if the current CubeRow Dictionary doesn’t have the extracted row caption, the CubeRow object is added to the Dictionary keyed by the row caption. If the row caption does exist within the current CubeRow Dictionary, the CubeRow object is retrieved and set as currentCubeRow.

After the row caption members have been exhausted (that is, the current column count is greater than the number of row dimensions) and the column header rows have been traversed (that is, the current row count is greater than the number of column dimensions), it’s time to add MDX cell values to the current CubeRow object. Each combination of a column header and column value is considered to make up a single CubeTuple instance. Each CubeRow contains a Dictionary of CubeTuple objects keyed by the column header. The column header is retrieved from an array of column headers previously constructed (recall a column header is a colon-delimited string of all column captions concatenated together). The index of the column header is the current column count minus the number of row dimensions (the total column count includes the row dimensions). The current MDX CellSet value is retrieved by accessing the appropriate two-dimensional (column, row) point. This is based on the current column count (minus the number of row dimensions) and the current row count (minus the number of column dimensions). This value is added to the CubeRow object using the AddTuple method, passing it the column header and column value. At the same time, the HTML representation is updated by adding the MDX cell value in between HTML table dimension (<td></td>) tokens. See Figure 3 for a graphical representation of the Dictionary<string, CubeRow>.

Both the Dictionary and HTML representation of the template are persisted to a previously defined file location using the BaseMDXTest.PersistTemplate method. Because the template needs to be manually validated by the unit-test developer, the test is considered to have failed, which is why the BaseMDXTest.TestMDXQuery method returns false for success.

Figure 3 A Graphical Representation of the CubeRow Dictionary

Once the template has been created, subsequent runs of the same test are validated against the previously stored template. When the TestMDXQuery method is called, a check is first made to see if a test with the given name exists. If it does and a new template creation isn’t requested (requesting to recreate the template may occur if the current template is incorrect), then the test result template is loaded into memory. The template includes both the object representation and HTML representation of the MDX result set. The BaseMDXTest.RunComparison method executes the MDX query and compares the results against the stored template. The MDX query results are traversed in the same way as they were during the template creation. The main difference between the creation of the original template and the validation against the template is that instead of creating the Dictionary<string, CubeRow>, lookups are done against the template Dictionary, checking to see if the same MDX query results exist. When looping through the rows and columns, the HTML table is created the same way as during template creation, except now the cells within the HTML table are colored. A green cell indicates that the cell matches the original template; a red cell shows a mismatch. By coloring the cells and presenting them in an HTML table, the unit tester has an immediate view of why the test case passed or failed. Anytime a mismatch is found, a Boolean (testPass) is set to false to indicate the test case failed.

While traversing the MDX query results and validating them against the template Dictionary, each CubeTuple (an object that contains the column’s dimension, concatenated names and the column’s value) found is removed from the current CubeRow Dictionary of CubeTuple objects. Therefore, after the entire MDX query result is passed, the original template Dictionary should have CubeRow objects with an empty Dictionary of CubeTuple objects if the MDX result was a complete match. Otherwise the new MDX query result had missing data that was contained within the original result. The BaseMDXTest.CheckForExtraDataInTemplate method examines the template Dictionary for remaining CubeTuple objects, executing recursively, and returns a value of true if CubeTuple objects remain. The testPass Boolean within the RunComparison method is set to false if extra data is found, and the test case fails.

After the MDX results have been completely traversed and validated against the template Dictionary, an instance of the CubeComparisonResult object is returned. It’s constructed with the testPass Boolean and the HTML table showing the result. The BaseMDXTest.TestMDXQuery method uses CubeComparisonResult to build an HTML page showing the original MDX query result HTML table and the comparison HTML table. The HTML is persisted to the file system by executing a call to the BaseMDXTest.PersistTestReport method, which creates a TestReport.html summary Web page listing all test runs and links to their HTML result pages, as well as a summary of the number of test cases that passed and failed.

Testing the Purchase Cube

Using both components of the cube-testing framework—the cube-creation code (XMLAtoCube) and the MDX query result template (BaseMDXTest)—you can create unit-test cases that validate the cube. Although the code for the framework is extensive, creating the test cases is simple and straightforward. Listing 10 in the code download contains sample unit tests for validation of the Purchase cube. These test cases use the Microsoft testing framework, but any testing framework can be incorporated.

The unit-test object (PurchaseCubeTest in the example) inherits from BaseMDXTest. The default constructor of PurchaseCubeTest constructs BaseMDXTest with the URL of the SSAS server where the cube is located and the base directory in which to store the MDX query result template and subsequent test results.

A [TestInitialize] method is used to create the unit-test version of the Purchase cube. It uses the XMLA definition of the original cube and creates it on the unit-test SSAS server (targetServerName) using a unit-test database (targetDatabaseName). It also points the source data URL to the location of the test dimension and fact data. [TestInitialize] is run only once for a given [TestClass], which ensures the cube is created only at the start of testing.

The test cases themselves are executed within [TestMethod] annotated methods. Each test case is simple. The MDX query is defined and then executed using the inherited BaseMDXTest.TestMDXQuery method, naming the test case and passing it the MDX query. TestMDXQuery returns true if the test passes or false if it doesn’t, and an Assert.IsTrue method is used to pass or fail the unit test. After all tests have been run, the resulting HTML test document can be opened and the failing test cases can be examined. Figure 4 contains an example HTML output of one of the tests.

Figure 4 The HTML Output of an Example Test

Properly Tested Code

Although it isn’t straightforward, even an OLAP cube can be unit tested using C#. The inclusion of both the Microsoft.AnalysisServices.dll and the Microsoft.AnalysisServicesAdomdClient.dll files within the SQL Server assemblies provides you with the APIs for both creating and querying SSAS cubes. The architecture presented lets you add a rich set of unit tests that produces output in a readable format so test case failures can be quickly identified. The method of test-case validation by template can be applied to other unit-test architectures where output validation isn’t straightforward. Examples of these range from applications that rely on traditional relational database persistence where the template contains the data expected to be stored in the database after the application runs, to UI applications that store the state of the UI in a data structure and display the UI in an HTML form.

I’m not quite sure if Thomas Jefferson or our founding fathers would compare the freedoms of a nation to the rights of properly tested code, but I’m pretty sure your users and supervisors would be pleased to know your application has been correctly put through its paces.

Mark Nadelson is a professional software developer with 22 years of experience in the telecommunications, Internet and finance industries. Throughout his career he has used a number of programming languages including assembly, C, C++, Java and C#. Nadelson is also the author of two books and a number of technical articles.

Thanks to the following technical experts for reviewing this article: David Neigler and Joe Sampino
David Neigler is a development manager in the financial services industry. His focus is the development of systems that solve big data problems faced by financial companies doing large volumes of trading.

Joe Sampino develops enterprise business intelligence platforms for large banks and investment firms. He creates and manages data warehouses, SQL Server Analysis Services cubes, and data analysis/reporting systems for internal, external, regulatory and compliance needs.