Updating Data Warehouse Data
Updating data warehouse data includes periodically extracting data from operational systems, cleansing and transforming the data, and loading the new data into the data warehouse. Each data update also includes tasks that must be accomplished to synchronize cubes if Microsoft® SQL Server™ 2000 Analysis Services is used for online analytical processing (OLAP), and to update any data marts that are part of the data warehouse.
The process of extracting, cleansing, and transforming data for a periodic update is essentially the same as the process used in the initial loading of the data warehouse, although the update process is often much less complex and more automated than the initial load process. Procedures and automated tasks developed during the initial load process can reduce the amount of manual effort required during updates. Corrections to source operational systems identified and implemented during the initial load also reduce the number of inconsistencies and errors that must be addressed during updates. However, it is often the case that manual intervention is required during updates to ensure the data is ready for loading into the data warehouse.
One difference between the initial data load and data updates is that verifying the referential integrity should be performed incrementally on update data before it is loaded into the data warehouse and made available to users. Updates often include additions and changes to dimension tables as well as the addition of rows to the fact tables. The new and changed data should be checked for internal consistency as well as verified against existing data in the data warehouse before it is loaded into the data warehouse.
After the update data has been made ready for loading into the data warehouse, you can use Transact-SQL, Data Transformation Services (DTS), or the bcp utility to update the data warehouse tables. Depending on the design and implementation of the presentation applications that provide access to data warehouse data for end users, you may need to take the data warehouse offline during the update to prevent inconsistencies in query results.