Microsoft's Contribution to the Data Warehousing Industry
Summary: Data warehousing has become one of the most dynamic, interesting, and fast-growing segments of the information technology industry. Microsoft Corporation is contributing to the rapid growth of data warehousing for decision support. A combination of products from Microsoft, and from an alliance of independent software and service providers, enables customers to operate powerful, yet affordable, data warehouse systems. Microsoft® SQL Server™ database, implemented as a data warehouse storage engine, provides the price/performance, ease of management, and Windows NT® and BackOffice® integration that makes SQL Server the preferred solution platform for many data warehouse and data mart systems.
Improved Decision-Making with Better Data Access and Analysis
Data Warehouses or Data Marts?
Why the Increased Popularity of Data Marts?
Microsoft Multidimensional Database Capabilities
Enterprise Data Warehousing Architecture
For the past twenty years, Microsoft Corporation has helped make information technology available to a larger number of people each year by lowering the cost and implementation challenges associated with that technology. Microsoft is making the same contribution today to data warehousing and is, thereby, contributing to the rapid growth of that segment of the information technology industry.
The foundation of Microsoft data warehousing is the Microsoft SQL Server relational database management system (RDBMS) on the Windows NT operating system. Microsoft SQL Server is a scaleable, high-performance database management system designed specifically for distributed client/server computing on Windows NT. Some of the reasons customers have chosen SQL Server for their data warehousing decision support system implementations include:
- Price/performance leadership and total cost of ownership.
- Compatibility with many other software components and tools used in the deployment of data warehouses.
- Internet integration.
- Connectivity choices.
- System administration capabilities.
- Best performing database on the Windows NT operating system.
- Built-in data replication.
- Ability to easily incorporate data findings into desktop office applications such as Microsoft Word, Microsoft Excel, Microsoft PowerPoint®, and e-mail.
- Microsoft's continued investment in new software products and capabilities.
Microsoft provides the preferred platform for many data warehouse systems deployments, and is investing in even more exciting capabilities in this important market segment in response to customer suggestions.
Data warehousing is a decision-support process involving a number of information technology components and services.
A data warehouse gathers operational data from a single or multiple sources, transforms it into meaningful subjects and information groups, and then stores it with time and history information to allow for effective decision support. It can provide multiple views of information to a spectrum of users. The power of this concept is that it provides users with access to trend analysis and with answers to business questions that are extracted from previously unrelated data sources, and it enables users to derive meaning from previously independent data. While production Online Transaction Processing (OLTP) systems are configured and optimized to provide quick response times to individual transactions, data warehouse systems store data in static form, and are then configured and optimized to support complex decision support.
The data warehouse market has come of age!
"The number of Global-2000 companies implementing DW strategies has surged from 20 percent to 80 percent over the past 18 months."
—DM Review, January 1997
Data warehouse implementations studied by IDC in their 1996 report, The Foundations of Wisdom: A Study of the Financial Impact of Data Warehousing, generated an average three-year return on investment of 401 percent.
For many years, all systems that extracted production and other data from source systems and stored it in useful ways in a decision-support system were called data warehouses. Over the past few years, a distinction has been articulated between data warehouses and data marts, even as the overall market continues to be called data warehousing.
General industry discussions indicate that, as of January 1997, approximately 70 percent to 80 percent of all data warehouses currently in production are, in fact, data marts.
Microsoft SQL Server on Windows NT is an especially popular platform for the deployment of data marts. The price/performance and capabilities of the Microsoft platform are a strong fit with the selection priorities of customers who are implementing data marts.
Intelligent Solutions, Inc., compares data warehouses to data marts is the following way:
|Data Warehouse||Data Mart|
|Constructed to meet the information needs of the entire enterprise.||Constructed to meet the needs of a specific business unit or function.|
|Designed to optimize integration and management of source data.||Designed to optimize information delivery of decision support information.|
|Manages extensive amounts of atomic level history.||Primarily focused at managing summary and/or sample data.|
|Owned and managed by enterprise Information Systems (IS) organizations.||Can be owned and managed by Line of Business IS group.|
There are a number of good reasons for the increased popularity of more narrowly defined data marts over enterprise data warehouse systems.
Data marts have dramatically lowered the cost for decision-support system creation and operation. The data mart approach has put the deployment of this decision support technology within the affordability range of a much larger number of users. While deployment budgets for data warehouses typically are in the range of two to five million dollars, data marts typically cost between $100K-$1 million for the total project budget. Microsoft SQL Server RDBMS on Windows NT Server contributes to reducing the cost of acquisition, deployment, and on-going operation of data marts. Among the drivers of data marts has been the deployment of low-cost servers—with the Windows NT operating system—that are powerful, low-cost, and provide sufficient RAM and disk storage to meet the needs of data marts
Wayne W. Eckerson, of the Patricia Seybold Group, Inc., pointed out in his report, "Understanding Data Marts," August 1996, that data marts are preferred by autonomous departments and small business units as a way to build their own decision-support systems. But data marts have also become the preferred way for IS departments to build large, central data warehouses. The idea is to build a data warehouse one subject or data mart at a time, gradually gaining experience and the support of key business managers who see concrete benefits every 3-6 months.
With data marts, it is easier to identify a committed customer or sponsor within an organization. Compared to enterprise data warehouses, data marts are more limited in scope and more focused on a set of user needs. The key here is a focused challenge and a focused team.
Data marts allow for faster prototyping to capture decision-support system requirements. Customer surveys have indicated that data marts are typically piloted in 30-120 days. Complete system development is often in the range of 3-6 months.
It is inherently appealing to be able to start a decision-support system with a modest plan and then grow, if necessary, after learning more about the source data and the end users' needs. This allows organizations to build their justification for data marts as they move forward.
Projects that began as data warehouses sometimes evolve into data marts. Some organizations have accumulated a lot of historical data for decision support that turned out to be never, or rarely, used. This discovery has allowed them reduce storage, or to archive information based on some criteria (such as date), and thereby shrink their data warehouse to a more focused data mart. Or the organization might split the data warehouse out to individual data marts that provide:
- smaller data marts with faster response time.
- less complex data marts for end users to access.
- data marts designed for target user groups.
The result is a data warehouse architecture that looks like the following:
Figure 1. Data warehouse architecture
New data warehouses are often implemented with large amounts of detailed data. After some period of usage, it is common for some ad-hoc requests to evolve into routine reports. Those reports can often use pre-summarized data tables. Or the data warehouse administrator may observe patterns of query access. The answers to queries can be pre-computed and stored in lower cost data marts, perhaps deployed in workgroups to simplify network access and further improve response times. This improves performance and reduces data warehouse/data mart complexity and size. Sometimes data warehouses become data marts in this scenario.
All of these considerations make it inherently appealing for organizations to implement their decision-support systems in more manageable data mart sized projects. The META Group observed, in their report on the META Group/DCI 1997 Data Warehouse World conference in February 1997, that: "IT focus has shifted from Enterprise DW cost justification to internal application delivery of data marts."
In a report dated May 1996, the Data Warehouse Institute's large scale survey—covering more than 1,500 organizations that have data warehousing projects underway—showed that the Microsoft Windows NT operating system was used in some way in 46.8 percent of those sites. Growth of data warehousing on the Microsoft Windows NT operating systems is particularly rapid and is now believed to be well over 50 percent of all system deployments.
The Data Warehouse Institute identified three factors that are shaping the growth of Windows NT in data warehousing:
First, data warehousing planners have shifted their focus from enterprise warehouse projects to architected collections of departmental systems, or data marts…Second, the "smaller systems" reputation of Windows NT may be fading. [Windows] NT is already supported on Digital's 64-bit Alpha processors, and upcoming systems based on Intel's new P6 and...P7 processors, combined in multiprocessor systems, promise to deliver enormous power. Most of those Intel systems will use Windows NT…Third, and most importantly, the distribution channel for data warehousing is changing. Data warehousing was once the exclusive preserver of database experts inside large IT organizations; today smaller companies and end user departments feel comfortable buying technology for data warehousing. The distribution channels that they use, more often than not, also distribute Intel-based systems, where Windows NT is considered a "top end" operating system offering high reliability, security, and manageability.
"The Foundations of Wisdom: A Study of the Financial Impact of Data Warehousing," report by IDC, 1996
The size of most production data warehousing projects is well within the range of production systems on SQL Server and Windows NT. There is general agreement that, on SQL Server Version 6.5 or Version 6.5 Enterprise Edition, databases of 100 to 200 GB can be supported. This is sufficient to accommodate a majority of current data warehouse projects, as evidenced by a Forrester study published in April 1996 that showed the following size distribution of production data warehouses:
Figure 2. Data warehouse size distribution
Microsoft continues to enhance the scalability of SQL Server to accommodate the growth of data marts and data warehouse systems. Incremental enhancements in hardware and in software functionality increase the number of concurrent users and the size of the data bases that can be supported.
SQL Server is the fastest database for Windows NT, proven by TPC benchmark tests.
SQL Server is part of the Microsoft BackOffice family of products; this ensures that you have the RDBMS designed and tested with other Windows NT best-of-breed applications. In addition to SQL Server RDBMS, those applications include: Windows NT Server, Internet Information Server, SNA Server, Systems Management Server, and Mail Server. Microsoft BackOffice provides a single, unified architecture sharing programming interfaces and tools, administration, security and network models and broad hardware options.
Heterogeneous replication technology in SQL Server allows data to be automatically published to some non-SQL Server Systems, including Oracle, DB2, Sybase, and Microsoft Access. Work is underway on an open interface that will enhance this capability to replicate data among information systems.
Microsoft price/performance leadership with SQL Server RDBMS on the Microsoft Windows NT Server platform has helped bring the cost of deploying decision support data warehouse systems into the affordability range of many new customers. Factors that contribute to lower cost of ownership include:
- runs on low cost hardware
- fast deployment
- easier system administration
Sales and Market Momentum
The more data warehouses that are deployed on Microsoft SQL Server and Windows NT, the more solutions and expertise become available on that platform.
- Sales of Microsoft SQL Server are growing much more rapidly than the overall database market, bringing the SQL Server installed base to over 2 million users.
- According to International Data Corp., Microsoft SQL Server is the market share leader of databases running on Windows NT Server—at unit share.
- In August 1996, fifty-seven percent of respondents to a survey conducted by Market Perspectives Inc., the META Group, and DCI identified Microsoft Windows NT as their organizations' data warehousing environment.
- More than 3,200 Microsoft Solution Providers offer products and services for Microsoft SQL Server—an increase of nearly 300 percent during the last two years.
Superior Connectivity Inside and Out
Microsoft SQL Server allows access to internal data across multiple platforms, delivers information reliably to remote sites, and publishes information in Web pages on the Internet. Microsoft SQL Server provides the following:
- Powerful data replication.
- Secure distribution of data across remote servers and mainframes.
- Seamless internet and intranet integration.
- Web Assistant to generate HTML pages from SQL Server data on a scheduled or trigger-driven basis.
- Tight mail integration.
- Ensures that the most current information is available to everyone who needs it, without delay.
- Microsoft SNA Server provides connectivity via LU 6.2 to many SNA legacy environments.
Some of the most popular Microsoft desktop user access products used for data warehousing systems are Microsoft Access, Microsoft Excel, Microsoft Query, and Microsoft Internet Explorer. In September 1996, seventy-four percent of respondents to a survey conducted by Market Perspectives Inc. and the META Group for DCI indicated their use of these Microsoft information access products in their organizations. Microsoft desktop software products have some strong links into Microsoft SQL Server. Although they work with any ODBC-compliant data base systems, they are developed and tested with Microsoft SQL Server. They utilize native ODBC that doesn't layer on top of other drivers; this improves performance. Customers appreciate being able to use the information access tools that they already own and know how to use, when those tools adequately meet their needs.
Microsoft Internet Explorer is rapidly gaining popularity for Web browser access to information. A product released in 1997 called Microsoft Visual InterDev™ helps to make SQL Server one of the best databases for Web development. It provides point-and-click, dynamic Web-to-database connectivity, including wizards and controls that automatically generate the logic for custom reports or databound HTML forms. Visual InterDev allows users to graphically build SQL queries or to create new SQL Server databases. And it provides a graphical user interface to administer SQL Server data bases with capabilities that include adding new tables, changing table structures on the fly, and creating/altering indexes and relationships.
Because the OLE standard is common across the Microsoft Office, data findings can be quickly incorporated into a Microsoft Word document, mailed elsewhere in the company, or incorporated into a PowerPoint presentation.
Also, many third-party user access tools providing a variety of capabilities have been ported to work with Microsoft SQL Server on Windows NT. Information about some of them can be found at the Microsoft Web site: www.microsoft.com/sql/default.htm.
Implementation by Many Independent Solution Providers
Many Systems Integrators and other solution providers now implement data warehousing solutions for customers that employ Microsoft SQL Server as the data warehouse storage engine. They find that the ease of implementation and administration of Microsoft SQL Server on Windows NT helps them deliver a solution to the customer more efficiently. And it reduces the effort associated with empowering customers to administer their own systems after they have been implemented.
- Server cursors ,which are useful for desktop browser applications. These are used for the type of decision support that occurs when users don't know what information they want until they see it while looking over what is in the data base.
- A cost-based optimizer with statistics-based query cost analysis for improved response in processing a mix of queries against the data base.
- Parallel data scanning for improved performance of ad hoc decision support queries.
- OLAP query extensions, CUBE and ROLLUP, that allow a query to return detail data and aggregates across multiple dimensions, simplifying information retrieval for analytical purposes.
- Data Pipe capability that allows SQL Server to programmatically retrieve information from multiple sources and populate SQL Server tables with the results.
- Insert Exec enables results from remote or extended stored procedures to be stored in tables.
- Index rebuilding without dropping and recreating the index.
- Asynchronous I/O supports parallel access to multiple disk devices for greater throughput when processing large queries that need to read data from multiple disk drives.
- Integrated security to protect your decision support data from intrusion.
- Centralized database management with scheduling and exception handling.
- Built-in support for symmetric multi-processing (SMP) with automatic workload balancing across multiple processors to increase total CPU throughput.
In 1998, Microsoft is planning to release an OLAP (Online Analytical Processing) solution which manages multidimensional data. The development effort is code-named the "Plato Project." It will include multidimensional OLAP (MOLAP) and relational OLAP (ROLAP) capabilities. Many customers who deploy data warehousing decision support systems discover that a portion of their decision-support needs are constrained by the two-dimensional data structure of relational database management systems. The Microsoft OLAP product is being designed with the following capabilities:
- Microsoft's OLAP solution will work with any data provider that exposes ODBC or OLE DB.
- Microsoft will offer a very unique and compelling integration of its OLAP and SQL servers.
- Microsoft's OLAP solution will expose an object model that will be programmable using Microsoft Visual Basic®, Java™, and C++® programming languages.
- Microsoft is working with many third-party applications providers to help them develop specialized front end products.
- Microsoft's OLAP products should make a contribution to the total cost of ownership (TCO) of OLAP systems—thus making OLAP feasible for customers who may not have been able to afford OLAP solutions previously available.
The understandable preference for, and proliferation of, data marts instead of enterprise data warehouses is causing a future systems integration issue for some organizations. Creation of a data warehousing architecture for the enterprise saves each data mart project team from having to recreate the elements of the architecture. If an organization's structure is one that may require future data mart integration, it is less work to plan and architect for this before the data marts are created than after the fact.
It behooves organizations that will operate multiple data marts to establish an information architecture or model. The goal of the data warehousing architecture is to identify the following common items:
- source systems for data to populate the data marts
- business dimensions
- business rules
- semantics (data definitions, measurements, and aggregation formulas)
If an organization requires, or will require, that multiple data marts each download similar data from production source systems, then the most maintainable architecture may be to move data first from the source systems to a well designed enterprise data warehouse or operational data store. Then, each data mart can be populated from the enterprise data warehouse, inheriting in the process all of the architectural elements that were implemented in the enterprise data warehouse. This approach also reduces the number of interface links between data marts and source systems that must be maintained.
Microsoft SQL Server on the Windows NT Server operating system is popular for the deployment of data warehousing systems because it lowers to cost of acquisition, deployment, and administration. The Microsoft solution is especially attractive to the many customers whose preferences have shifted from complex enterprise data warehouse systems to more manageable data mart projects. Data marts on Microsoft SQL Server and Windows NT Server provide faster delivery and payback of needed decision support functionality. Capabilities developed at Microsoft for desktop software products are now extended to multi-user server products, including integration of the desktop with the database server. And Microsoft is continuing to invest in future product development and enhancements to assure an attractive selection of data warehousing products for the future.
© 1997 Microsoft Corporation. All rights reserved.
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.