Maintaining OLAP Data
The purpose of Microsoft® SQL Server™ 2000 Analysis Services is to provide rapid analytical access to data warehouse data. To accomplish this purpose, Analysis Services creates multidimensional cubes from data in the data warehouse fact and dimension tables. Numerical measures are also summarized into preaggregated values during cube construction. Cubes are stored in multidimensional structures that are designed for rapid query response, combining preaggregated information with raw fact data to respond to a wide variety of queries.
Cubes can contain data summarized, copied, or read directly from the data warehouse. Changes to the structure of the data warehouse or the data contained in it can affect the integrity and accuracy of cubes that have been created from the data warehouse. Because Analysis Services provides continuous online access to cubes, changes to the underlying data warehouse must be approached with a clear understanding of their effects on cubes and how to manage the synchronization of data in the data warehouse with data in cubes.
OLAP data must be updated after data warehouse data is changed. You process OLAP cubes, dimensions, and partitions to incorporate new or changed data from the data warehouse. The method of processing an OLAP object depends on the object and type of change made to the data warehouse, such as data addition, data change, or structural change.
Real-time OLAP is a feature that uses real-time cubes to automatically synchronize cube data with changes in the underlying relational database. Real-time cubes can be used for applications that need to monitor and analyze live data, and are intended to extend OLAP capabilities rather than replace traditional cube designs and applications.
Changes in the Data Warehouse
Data is usually added periodically to the data warehouse to include more recent information about the organization's business activities. Changes to data already in the data warehouse are less frequent and usually made only to incorporate corrections to errors discovered in the source from which the data was extracted, or to restructure data due to organizational changes. Structural changes to the data warehouse design typically are the least common.
It is common to add new data to the data warehouse. Cube information available online to client applications can be affected when data is added to the data warehouse due to interaction between the data and cube partitions. You can manage the effects of adding data to the data warehouse by carefully defining partition filters, and by designing a strategy to synchronize OLAP and data warehouse data.
Changes to correct errors in a data warehouse can be minimized by applying care during the data transformation, validation, and scrubbing operations. Other changes to existing data warehouse data can arise from changes in the structure of an organization or its products. For example, reorganizing products into different categories can require significant changes to data in the data warehouse, as well as to reports derived from the data warehouse. In some cases, such changes can require the complete redesign of cubes. In other cases, the redesign of dimensions and the processing of all cubes that use those dimensions may be all that is required.
Changes to correct errors in basic data should be incorporated in the source database, usually the OLTP business database, and then migrated to the data warehouse in a controlled manner. Many business OLTP database designs require changes to be made by a transaction that offsets the incorrect data and applies new correct data. It is often easier to manage the impact of such correction transactions on OLAP data. Cubes can incorporate new data transactions that correct value errors, such as an incorrect sale value. However, transactions that move a fact from one dimension member to another, such as a sale posted to the wrong customer, can affect the results of aggregate functions such as Avg. This is true for non-OLAP databases as well; if an original sale order is zeroed out but the record remains in the database, it will be included in the count of sales records and affect the calculation.
Depending on cube storage design, changes to data in the fact table can affect the accuracy of queries to a cube until the cube is processed. The Refresh data processing option can be used to reload the cube's data and recalculate the aggregations. Because aggregation design remains the same, the Refresh data processing option is faster than the complete Full process processing option.
Dimension hierarchies can be affected by changes to data in the data warehouse dimension tables even though the table schema remains the same. The dimension hierarchy is based on relationships between members in a dimension table. When these relationships are changed (for example, when cities are reorganized into different sales regions), the dimension structure must be rebuilt.
Referential integrity must be maintained when data warehouse data is added, changed, or deleted. Loss of referential integrity can result in errors during cube processing, fact table records being bypassed, or inaccurate OLAP information.
The structure of OLAP cubes and dimensions can be affected by changes to the design of the data warehouse such as the addition, deletion, or alteration of tables, or relationships between tables. When the structure changes, you must modify the design of affected cubes and dimensions, redefine partitions and aggregations, and completely process the modified cubes and dimensions.
Synchronizing OLAP and Data Warehouse Data
Valid cubes are online and available to client applications at all times when the Analysis server is running. Because of the potential for interaction of OLAP cube partitions with data in the data warehouse, the design of the data warehouse should include a synchronization strategy to enable the addition of data without causing cubes to provide incorrect answers to queries in cubes available to online client applications.
One strategy for managing additions to data warehouse and OLAP data is to design a batch update system. In this strategy, all data in the data warehouse fact table includes a batch number in each record. When you design a cube, add an expression to the filter for each of the cube's partitions to specify the largest batch number applicable, for example, "... AND DWBatch <= 33 ..." When additions to the fact table need to be made, include a new, higher batch number in the new records. Cubes are unaffected by these added records because the cube partitions are restricted to reading data from previous batches only.
Data added to a dimension table does not affect existing cube private or shared dimensions until the dimensions are processed. A batch number in dimension table records is not necessary, but it can be useful in ensuring continued referential integrity.
Dimensions and cubes or partitions can be processed to incorporate new data after a batch of data has been added to the fact table and dimension tables. Shared dimensions should be processed before the cubes that use them. To add new members to a dimension that do not affect the dimension's structure, use the Incremental update option. To add new members and rebuild the dimension's structure, use the Rebuild the dimension structure option. Note that when a shared dimension is processed with the Rebuild the dimension structure option, all cubes that incorporate that dimension will immediately become unavailable to client applications and must be processed before they can be used again. However, when a shared dimension is processed using the Incremental update option, a cube that uses the shared dimension will display the new members, but the cells associated with those members will remain empty until the cube is updated with new data from the fact table that relates to the new members.
To incorporate a new data batch in a cube, update the filter expression in each of the cube's partitions to include the new batch number, and then process or incrementally update the cube. If a cube's data is divided among multiple partitions, you can use one of the partitions to accumulate new data batches and process that partition only. The cube's other partitions must have filters that exclude new data so that data will be added only to the accumulation partition.
Visibility to Client Applications
When a cube that is currently online is processed by any one of the three processing options (Full process, Incremental update, or Refresh data), the cube remains online until the processing has been completed, at which time the online cube is replaced by the new cube version. When a cube is processed using the Full process option, online client applications will be disconnected from the cube when the switch is made to the new version of the cube, and the client applications must individually reconnect to access the new version. When a cube is processed using either the Incremental update or the Refresh data option, online client applications will not be disconnected from the cube when the processing completes. The new version of the cube will be immediately visible with no break in service.
The processing of a shared dimension can affect cubes that incorporate the dimension in their design. If a shared dimension is processed using the Rebuild the dimension structure option, all cubes that use the dimension will immediately become unavailable to client applications and must be processed before they can be used again. If a shared dimension is processed using the Incremental update option, cubes that use the dimension remain available to client applications and any new members added to the dimension automatically become available to client applications when the dimension processing is complete. Any such new members will not have fact data associated with them until the cube is updated with new related facts.