Part 3: Replicating Excel 2003 Formulas to SQL Server
Microsoft Office Editions 2003
Summary: In the third of a three-part series, learn how to automate a process for storing Excel 2003 formulas in a SQL Server database. This allows access from the database to the calculated spreadsheet values available only in the worksheet. (8 printed pages)
Creating an Excel Workbook with Metadata Tags
Using the Formula Parser Application
Adding Metadata to a SQL Server Database
Creating the Database Objects
Performing Calculations in the Database
In a previous article, Part 2: Mapping XML from SQL Server to a Single Cell in Excel 2003, we showed a solution for dynamically creating XML maps in Microsoft Office Excel 2003 by using single-mapped cells. The maps enable us to import and export the spreadsheet data to a Microsoft SQL Server database from the mapped cells. However, in the database we also need to have access to calculated spreadsheet values, which use many Excel formulas and are available only in the spreadsheet. We couldn't use XML single-cell mapping for cells with formulas because the XML map import process overwrites the formulas with values.
We designed an automated process for translating the Excel formulas into a SQL-compatible format and storing them in the database to address this problem. This approach enables us to:
- Replicate all calculations in the database without using the spreadsheet as a calculation engine.
- Perform calculations on aggregated data for easy reporting.
This solution eliminates the problem of trying to maintain the same business rules and logic in two environments: the spreadsheet and the database.
In the first article of this three-part series, Part 1: Automating the XML Data Mapping Process in Excel 2003, we described our system of embedding metadata identifiers in hidden control rows and columns, which are used to locate data cells. We showed how to use a simple macro to generate an XML map from these tags, which you can use to import or export data. In this article, we expand on this model by including formulas in our sample workbook and extending the concept of control tags to the cells containing formulas.
We created a Microsoft Windows Forms application in Microsoft Visual Basic .NET to parse an Excel workbook that already contains our control tags. It examines every cell at the intersection of our column and row tags and inserts a row in an ADO.NET Datatable. If the cell is found to contain a formula, we set an "OnSheetCalcInd" flag to "1" and pass the range to our formula parser. The parser converts Excel formulas to a syntax that SQL can interpret. After parsing is complete, we persist the information to data files. The data files are uploaded to the FormulaComponent and Cell database tables, and then the calculations are replicated in the database using dynamic SQL by executing the stored procedure spReportDataRefresh. Now that we've described the steps, it may be helpful to view a diagram of the whole process.
Figure 1. Overall process diagram
Our users create conventional Excel spreadsheets containing a blueprint layout for displaying corporate metrics. We use the BuildGrid macro from the first article of this series; Figure 2 shows the result. Note that cell AD30 contains a formula, not raw data. The formula in cell AD30 is SUM(AB30:AC30) and it refers to other cells that also contain formulas. For example, cell AC30 contains the formula SUM(AC27:AC29). Our Visual Basic .NET utility (included in the download accompanying this article) recursively examines any cell references to see if the referred cell also contains a formula.
Figure 2. Sample worksheet after running BuildGrid macro to create element names (click to see larger image)
The most challenging part of this solution is turning the Excel formula into a form that SQL can understand. For example, after parsing the sample workbook, we convert the formula in cell AC30, which contained a SUM function, into (@AC27+@AC28+@AC29). This is because SQL variables must begin with an @ sign, and although SQL has a SUM function, it is used quite differently. The sample code presented in this article handles the following operators and functions. They can be nested to any depth.
- + - / * ( )
- SUM(Cell1:Celln) or SUM(Cell1,Cell2,...Celln)
- Linked cells (for example, Sheet2!AB23 + AC75)
Note You can extend the code to handle other formulas and functions by adding other patterns or formula signatures. If you need to recognize more complex functions, you can follow the pattern and add additional function recognizers to the Formula class. The previous capabilities were sufficient for us to convert more than 1,500 formulas in our project's Excel template. There is another possible modification that you should consider for scenarios that use a lot of linked cells between the sheets. The change is to append the worksheet name to the cell address for the variable name.
We created a Windows Forms application using Visual Basic .NET 1.1 to gather metadata about all data and formula cells used in the workbook.
To use the Formula Parser application
- Download the accompanying .zip file, named ReplicatingExcelFormulasToSQL.zip, and extract its files. The main folder, "ReplicatingExcelFormulasToSQL," contains three subfolders: "Sample Workbook", "SQL", and "VB".
- In the VB subfolder, double-click FormulaParser.exe.
- In the Formula Parser application, click Parse Excel, and then browse to the Sample.xls file, which is located in the "Sample Workbook" subfolder. Click Open. This loads the data in the spreadsheet into the Formula Parser application.
This sample is set up to use the BuildGrid macro, included in the first article of this series, Part 1: Automating the XML Data Mapping Process in Excel 2003. The sample file displays cell data in two DataGrids from the System.Windows.Forms namespace (see Figure 3 and Figure 4), whose contents are written to text files when you click the Save to File button. You could extend the sample to update your database directly.
Figure 3. The Cells tab of the Formula Parser application (click to see larger image)
Figure 4. The Formula Components tab of the Formula Parser application (click to see larger image)
All of the Visual Basic .NET code for the parser is contained in the Formula class. The hardest working subroutine in the class is ParseFormula, which also gets called recursively.
Private Shared Sub ParseFormula(ByRef wks As Excel.Worksheet , ByRef r As Excel.Range, ByRef wksFormula As Excel.Worksheet) [...] If wks.Name = wksFormula.Name Then ParseFormula(wks, rngComponent, wksFormula) Else ParseFormula(wksFormula, rngComponent, wksFormula) End If [...]
After we save our workbook metadata to text files, we can import the data into our SQL Server database tables.
The Save to File button in the Formula Parser application creates two text files: FormulaComponent.dat and Cell.dat. One easy way to load the data into the FormulaComponent and Cell tables from these files is to use the DTS Import/Export Wizard in SQL Enterprise Manager. In the "Performing Calculations in the Database" section of this article, we list an SQL script that will load the tables with sample data.
Figure 5 shows the database structure for trying out the code.
Figure 5. The database structure used by the Formula Parser application (click to see larger image)
After you create a test database on your SQL Server, you can run the SQL code included in the download accompanying this article. These two files are stored in the folder titled "SQL". First run DBObjectsCreationScript.sql from SQL Query Analyzer (QA) to create the six tables and three stored procedures used in this article. Then load the sample data by running the script in DBSampleDataLoader.sql.
Now you can run the following statement in Query Analyzer:
EXECUTE spReportDataRefresh @TimeId = 1
The spReportDataRefresh stored procedure performs the following steps:
- Updates the ReportData table with current single-mapped cell data from the source workbook.
- Sums the regional data for the parent locations.
- Loops through each row in the Cell table that contains a formula, and calls spCalculatedValueSel to perform the calculation for each LocationId variable.
- Updates calculated values in the ReportData table by applying the SQL-compatible formulas.
It works by looping through the components (identified in the FormulaComponent table) for each formula and dynamically creating an SQL script to declare variables, assign values, and select the result of the formula. If we find that we're missing a value, we substitute a zero. If we encounter a divide-by-zero error, we leave the NULL in the DataValue column of the ReportData table for that row, and then continue. The TimeId variable adds time dimensionality to our data and allows us to update calculations for only the current period, thus maintaining our historical data.
A detailed explanation of how the dynamic SQL code works is beyond the scope of this article. See the inline comments in the stored procedure scripts for additional information about how they work.
Figure 6 shows a portion of the ReportData table after we refreshed our calculations.
Figure 6. Sample data produced by stored procedure spReportDataRefresh in the ReportData table (click to see larger image)
You can obtain a richer view of the data by displaying an SQL view we created (joining the ReportData table with its dimension tables):
SELECT * FROM Report
This article described how we took advantage of our grid structure to programmatically extract and store Excel formulas in a SQL Server database and replicated the UI-side calculations using dynamic SQL. In Part 1 of this article series, we showed how to take advantage of XML maps in Excel to create a template for data entry. Our stored procedures were then able to perform aggregations for parent locations and update calculated data as well. You can extend the sample code in this article as far as needed for most scenarios that require support of flexible XML mapping and Excel formulas applied to aggregate values.
Currently we use a variation of this code in a project called "The Rhythm of the Business 2.0," which supports a flexible and constantly changing Excel template used by more than 1,000 employees. One of our additions to the sample code is the ability to scrape the row and column descriptions; we use these for a reporting application using SQL Reporting Services. We did this by adding meta tags framed by square brackets for marking up the description row or column; our Windows Forms application added this information to our Cell table. We hope this information gives you ideas for getting additional benefits from your investment in Microsoft Office Excel 2003.