Service-Oriented Business Intelligence
by Sean Gordon, Robert Grigg, Michael Horne, and Simon Thurman
Summary: This discussion looks at the similarities and differences between Business Intelligence (BI) and Service Orientation (SO), two architectural paradigms that have developed independently. Here we define an architectural framework that leverages the strengths of BI and SO while defining guiding principles to ensure that the fundamental tenets of each of the component architectures are not broken.
EAI and ETL Convergence
A Wider Range
Service-Oriented Data Provision
A New Breed of Business Services
Enterprise Data Quality
SoBI Integration Patterns
Short Life Expectancy
About the Authors
The noun synergy means the combined action of discrete entities or conditions such that the total effect is greater than the sum of their individual effects. Service-Oriented Business Intelligence (SoBI) is the synergy of the Business Intelligence (BI) and Service Orientation (SO) paradigms and describes patterns and architecture to accomplish this synergy by providing a best practice implementation framework; the ability to integrate at the most appropriate architectural level; the data modeling of a BI project within the SO strategy of leaving the source systems in place; and a common implementation for data transformations and data logic: data to data, data to service, service to data, and service to service.
Figure 1. BI and SO views (Click on the picture for a larger image)
BI and SO are broad paradigms. Let's start by defining BI. Data warehousing is a broad discipline that allows for the gathering, consolidating, and storing data to support BI. The components of a successful data warehouse can be summarized as data collection, cleansing, and consolidation (Extract Transform and Load, or ETL) and data storage. BI is the delivery of information to support the decision-making needs of the business. It can be described as the process of enhancing data into information and then into knowledge.
Figure 2. BI view of SO (Click on the picture for a larger image)
Every BI system has a specific goal, which is derived from the requirements of the business.
For ease of reading, data warehousing and business intelligence will be consolidated under the single acronym BI throughout this discussion.
Figure 3. SO view of BI (Click on the picture for a larger image)
SO is a means of building distributed applications; at its most abstract, SO views everything as a service provider: from applications, to companies, to devices. The service providers expose capabilities through interfaces. These interfaces define the contract between the caller of the service and the service itself. The consumer of a service does not care how the service is implemented, only what the service does and how to invoke the service.
The services themselves are the building blocks of service-oriented applications. Services encapsulate the fundamental aspects of service orientation, namely, the separation between interface and implementation. SO is essential to delivering the business agility and IT flexibility promised by Web services. These benefits are delivered not just by viewing service architecture from a technology perspective or by adopting Web service protocols, but also by requiring the creation of a service-oriented environment that is based on specific key principles.
BI has been around for years; its practices are well established, and people are comfortable with the concepts involved in delivering a BI solution. To many, BI is simply the presentation of information in a timely manner through a sophisticated client interface. To those involved with the delivery of BI solutions, this aspect of the overall BI solution is the tip of the iceberg. Under the covers is a huge exercise in data quality improvement and data integration among disparate corporate applications and systems and the consolidation of that data in a data warehouse. The integration of data is predominantly the biggest cost on a BI project.
Until recently, SO has had little or no part to play in the world of BI, primarily because the SO approach to data integration seemed laborious and overly complex to a community used to moving data of any volume around by connecting directly to the source system at database level. In BI, data integration is accomplished through the ETL process, which is the keystone of every BI solution, and BI solutions tend to look for the most direct and efficient way of accomplishing it.
Enterprise application integration (EAI), like BI, has been around for many years. It is a common problem encountered within an enterprise where systems have been introduced or grown in an organic manner. EAI itself can be defined as the sharing of both process and data between applications within the enterprise. Specifically, when we use the term EAI, we are referring to the integration of systems within the enterprise—for example, application, data, and process integration.
Application-to-application integration refers to the exchange of data and services between applications within the enterprise. Notably, this form of integration is often between applications that reside on different technology platforms, based on differing architectures. EAI is often difficult, and typically it requires the connectivity between heterogeneous technology platforms; involves complex business rules and processes; involves long-running business processes where logical units of work may span days or weeks as they move through different processes within the organization; and is generally driven by the need to extend/enhance an existing automated business and process or introduces an entirely new automated business process.
Solutions to EAI problems can involve solving the application integration problem at a number of different architectural levels such as data, application, process, and so forth. In this way both ETL and SO can form part of an EAI solution. In many ways, SO grew out of the need to find common, open, and interoperable solutions to the EAI problem.
Figure 4. SO as a data source to BI (Click on the picture for a larger image)
Traditional ETL is a batch-driven process focused on integrating data during business downtime. In today's connected marketplace, business does not have a quiet time for this process to occur. The corporate data pool has the potential to increase significantly as initiatives such as clickstreams, e-commerce, and RFID are increasingly used, and ETL must be flexible enough to emerge from its batch-driven roots to deliver data on an event-driven basis.
Within an SO solution, these events are readily routed, consumed, and integrated as part of an event-driven architecture. BI grew out of the inability of operational systems to efficiently handle analytical capabilities (aggregation, trend, exception, and so on), hence, distinct BI systems were developed to meet this need. This artificial divide is no longer acceptable; today's businesses are increasingly looking to use analytical capabilities to guide their operational decisions—for example, in the detection of a suspect credit card swiped at the checkout based on historical data mined for fraudulent patterns.
Figure 5. BI consuming SO events (Click on the picture for a larger image)
Organizations are also increasingly keen to unlock this data and make it widely available to other parts of the organization, to a wider audience of users and tools. Traditionally, access to BI information has required access to a specific set of data-manipulation tools. In this current SO environment, the goal must be to open up this organizational data to a wider audience and enable the value of the data to be realized more widely.
As the variety and number of data sources considered in-scope for BI increases, so does the potential complexity of the required ETL solution. For example, Web services, RSS, and unstructured and semistructured data are sources of data that now fall under the data integration umbrella, but they are not traditionally associated with ETL. With the widespread adoption of service-oriented principles and the associated enabling technologies, access to a greater range of systems has become possible, unlocking a much greater range of business data.
Organizations continue to scrutinize the economics of data integration with every project, affecting the products they select, which is causing a change in the data-integration landscape and an associated change in the functionality provided by the ETL or EAI software. Vendors have to increase the flexibility and functionality offered by their platforms in response to these new challenges. The net result to the customer is a set of tools with an increasingly merged set of functionality.
Users familiar with Microsoft's product set cannot fail to have noticed this pattern. BizTalk was developed in the EAI and business-to-business (B2B) space but has applicability in some ETL scenarios. Data Transformation Services (DTS), Microsoft's ETL tool that forms part of the SQL Server 2000 platform, grew from an ETL background. It is no coincidence that the SQL Server 2005 version of this product has been re-architected to support a wider range of integration scenarios, and was renamed Integration Services to emphasize this.
Although service-oriented architectures (SOAs) and BI architectures have evolved separately and include technologies and disciplines specific to their own individual architectural aims, many of the technologies that they utilize overlap. There is also a clear mapping between the concepts used, with each able to view the other in their own terms. We have called these "views from the other side," and the overview can be seen in Figure 1. Let's discuss each scenario in more detail.
From a BI perspective, it is possible to view an SO application as a collection of data sources and event sources. There are two primary modes in which a service can operate as a data source in a BI context: service as the provider of data upon request and service as the publisher of events that are of interest (see Figure 2). In both scenarios, the message sizes are small. The solution for large-scale data transfer and transformation will still be through the normal data warehouse import techniques such as ETL. Such physically large messages are not the normal domain of SO.
From the SO viewpoint, BI can be seen as a collection of services. From an SO perspective, a data source can readily be exposed as a service with the introduction of a simple façade layer that receives the service request from the service bus and calls the appropriate query. The façade then transforms the results of the query (if necessary) into the data schema and returns the results to the caller (see Figure 3).
Figure 6. Message size versus message volume (Click on the picture for a larger image)
The SoBI architecture makes BI data in the data warehouse available as a service to other applications within the architecture. This availability gives applications a clean way of accessing consolidated data to support the requirements of BI. In this way, the BI architecture becomes an integrated component of SO application architecture. Note that there will be occasions when the type of data that is required from the system is purely of a BI nature, such as large-scale data export. In these scenarios the service interface approach will not be suitable.
Table 1. SO and BI benefits
|Service Orientation (SO)||Business Intelligence (BI)|
|Best suited to application-to-application integration and well suited to low-volume, high-frequency events||Best suited for data-to-data integration and able to handle large data volumes|
|Provides an operational platform, tightly defines data formats and structures, and encapsulates and abstracts functionality||Provides a combined model of the enterprise data and provides foundation for business decisions and the ability to ask any question of the data|
|Supports reuse of enterprise components and allows agile change in business processes||Good tools and mechanisms for transforming data|
From a BI perspective, a service can readily be exposed as a data source with the introduction of a simple façade layer that provides a mapping between the BI interface and the interface exposed by the service. The façade then transforms the results of the call from the data schema used on the service bus to the data format expected by the BI platform and returns the results to the caller (see Figure 4).
Some services expose information through events that are published when an interesting change occurs in the service. Other services and applications within the organization can subscribe to the events published by services. Integrating event-publishing services into a BI platform is achieved through the use of an agent that collates the subscribed events and periodically transfers them in bulk to the BI platform (see Figure 5).
One of the challenges in the development of SoBI was to find an approach that leveraged the core strengths of each architecture and identified the area where the integration caused challenges. Let's look at some of the main challenges inherent in implementing BI in an SO manner. These challenges generally arise because of the specific requirements that each architecture was developed to address.
Figure 6 shows the differences in data granularity that separates the BI and SO approaches. At the extreme ends we have small-grain messages or events, which are naturally in the SO event space. Such small-grain messages are not readily or efficiently consumed in a BI architecture without the use of event agents to collate the events and import them into the data warehouse on a regular basis. Large-grain data, the bulk import or movement of large quantities of data, is most efficiently handled through data warehouse techniques such as ETL. Services in an SO system that expose large amounts of data are inefficient to use and rarely implemented. In the middle ground we have the medium-grained typical services, defined to consume sufficient data to fulfill a particular requirement. This middle ground is key to SoBI's value add.
Service-oriented applications exhibit loose coupling between their constituent services. This loose coupling is one of the fundamental principles of SO and supports the development of agile/flexible applications that can adapt to business change. See Table 1 for specific SO and BI benefits.
BI solutions are tightly coupled to the data sources that feed the data warehouse and to the applications that use it. BI has evolved from a batch-centric environment where ETL is used as the means to directly consume and consolidate large amounts of source system data on a known schedule for population of the data warehouse.
Figure 7. The SoBI framework (Click on the picture for a larger image)
SO requires that the message interface and the formats of both the incoming message and the eventual response to be tightly defined. When exposing services that describe business capabilities, the questions that can be asked within an SO application are in effect known in advance. However, there is also an unknown aspect relating to the exposing of services that are aggregated or orchestrated by another application. BI is concerned with the ability to allow applications to ask any question of the data warehouse within the limitations of the data warehouse data model. The question, or the size; content; and format of the result are not known until the question is asked.
An SO approach is best suited to the exposing of services that encapsulate business capabilities or services that publish events of interest to other systems. BI provides a closed environment for satisfying the information requirements of the business. BI enables the consumer of data to view it in potentially many different new ways. This flexibility provides the ability to identify trends and relationships that may be overlooked.
Previously we introduced the core strengths of each of the paradigms, each of which can be seen as a fundamental issue if viewed purely from the stance of the other paradigm. Now let's revisit these challenges and see what benefits SoBI can bring.
Surface warehouse ETL functionality to SoBI enables interesting business events caught during the ETL process to be published as events. Let's consider some examples.
Referential integrity. SoBI can provide aggregations of the position "now." For example, data not in the source system may be added in the data warehouse as a placeholder to maintain the integrity of the data in the database. This placeholder may occur when disparate source systems provide data at different times (for example, transactional values supplied before associated reference information has been received). This data will be stored as private placeholder data within the BI service. A notification may be sent to the source system to ensure that an error has not occurred, and when the reference data becomes available the data warehouse will be brought back in line with the system of record.
Single-validation service. The functionality required for validation during the ETL stage can be exposed through the SoBI framework for use within other parts of the business.
Timely updates. The key advantage here for BI is that adoption of ETL from an SO perspective can enable real enough time feeds into the data warehouse and therefore potentially real-time data warehousing. In addition, the better support for events and event-driven integration can provide BI with a much better mechanism for invoking ETL than the traditional methods, such as timed scheduling or persistent scan of a known directory for a flag file.
Common business schema. Within any organization there can be multiple systems that hold information about the same entities and that hold this information in different formats. There are clear synergies between these scenarios:
- Building the logical data model for a data warehouse is critical to any BI initiative. The data model is the end product of the efforts to consolidate the data from the disparate source systems. The structure of the data model drives the transformation exercise that occurs as part of the population of the data warehouse and, hence, enables the data warehouse to provide the single version of the truth for management information.
- For any entity aggregation service within an SO design it is important to reach agreement on common meanings for the entities that the services will operate on. This agreement is referred to as schema consolidation and is the process of creating master data schemas that contain a superset of the information to describe the entities in the system in sufficient detail that the different services can each locate the data they require.
Another entity service that aligns with this approach is the ability to expose reference data. According to Easwaran G. Nadhan, Principal, EDS, "It is clear from the experiences in the (relatively small) number of organizations that have moved aggressively into SOAs that coordinating reference data is the required first step toward service orientation" (see Resources).
Table 2. SoBI principles
|It is…||…the single version of the truth for BI data.||…the architectural approach for application integration.|
|It will…||…provide open access to data services, support ad hoc analysis, support precanned management reporting, consolidate data.||…provide application-to-application integration, provide some event feeds to the DW, describe the services provided and the messages passed, fulfill operational requirements, and provide the infrastructure services for all applications.|
|It will not…||…become a dumping ground for all data, become the data owner, be the default data source to other applications, and support operational reporting.||…be used in every circumstance and replace data import interfaces.|
One version of the truth. By looking at the functionality offered by the two architectures holistically, SoBI gives us the opportunity to consolidate operational and BI data without the need to physically move all operational data into the BI platform, which is a common approach to providing the "single version of the truth" in a BI project. By adopting the most appropriate architectural choice, the operational data can be left in place but still be available to the SoBI framework as a service, should the need arise to access or view it.
For example, in a BI environment interested in the analysis of health and safety incidents, SoBI would allow factual detail of each incident to be moved to the data warehouse—that is, the fact that an incident happened, where it happened, and the classification of that incident—which would allow all appropriate aggregation and analysis to be performed as expected of the BI platform.
Where SoBI wins is that it provides a means to still access the transactional detail in the system of record (for example, the free-format text that accompanies the incident, which describes the circumstances surrounding the event in detail). This access is commonly referred to as "drill through" or "drill to detail" in BI, and to accomplish it all data relevant to the requirement is traditionally moved into the data warehouse.
In fact, this may not be allowed (such as for data protection reasons) or preferred (it increases the ETL burden and the storage requirements of the data warehouse to hold data, which by definition, is operational in nature), and it has the potential to add additional pressure on the BI platform by becoming the de facto source for all incident data even though it is never up to date.
Business services. The combination of the enablement of real-enough-time BI and the ability to leave operational data in place gives us the opportunity to build a rich service around the SoBI framework that would not be possible if working within only one of the component architectures. Consider a system detecting retail fraud. In BI we have the tools to build an analytical engine capable of mining the potentially huge amounts of transactional data for patterns resulting in a list of suspicious credit cards. The adoption of SO principles allows us to offer a service providing the details of credit cards in use in our store as they are swiped. The SoBI architecture allows this service to be consumed by the BI component of the platform and analyze it against the known list of suspicious cards, and therefore to respond immediately should a potentially suspicious transaction be detected.
Aggregation of transactional and historical data as a service. Given a service exposed through the SoBI framework that can now seamlessly aggregate current transactional data and historical warehouse data, a new breed of business services can be supported. An example would be slowly changing dimensions, which is where a value such as a customer name changes over time. Obviously this information is available within the data warehouse, so if there is a requirement to expose an entity service that gives a single view of the customer, this can be achieved more accurately and easily.
Brings interface abstraction patterns to BI. The ability to use the interface abstraction pattern over BI functionality makes the functionality and data more accessible to line-of-business applications and provides the capability to expose complex business rules usually buried in the ETL layer.
Figure 8. Ideal scalability and flexibility (Click on the picture for a larger image)
Cleansing and consolidation. Data will be changed for purposes of consistency and integrity. Where this change involves a mapping operation, that mapping will be made available to the architecture as a service. Where this change involves a correction to data, details of that correction will be fed back to the system of record through a request for change to the owning service, that is, the ETL process cannot change the data as it is not the owner. In turn, this process obviously drives improvement in the quality of data.
Obtain a cross-system, consistent view of the product. During the ETL stage data of the same product (or entity) may require transforming in order that it may be stored in a consistent manner. By service enabling access to the data the organization can expose a single common view of a product.
Mappings available as a service. As previously stated, mapping is a fundamental requirement within the ETL stage. The SoBI framework enables the mapping functionality to be exposed as a service for other uses within the organization. Such uses include EAI and enterprise reference scenarios. This availability of this service can also be used to promote best of breed transformations.
Calculation. The data warehouse is often used to store precalculated values to support the requirements of BI. For instance, sales and forecasting data may be held in different physical systems. The consolidation of the data from these systems into the data warehouse allows us to calculate and store actual versus forecast figures to support more performant analysis and reporting. The business logic used to define such calculations is often interesting to other parts of the business so the calculation to support this invention of data in the data warehouse will be made available to the SoBI architecture as a service.
Provides a road map for integration. It is believed that one of the outcomes for the SoBI framework is an ability, at an architectural level, to provide a framework for future integration scenarios.
Compliance/audit. Application of the SoBI framework requires adherence to a formal governance process. Examples include the identification of the system of record or operational data owner, and the definition of the messages that describe the data and functional requirements. Given only the owner of the data can make a change to that data, other systems simply make a request for change, and auditing can be carried out at a single point.
Aggregation. To support fast response times, data in the data warehouse is often preaggregated. For instance, the data warehouse may contain data relating to sales at an individual transaction level, but the majority of management reporting may require seeing the totals at the month level. In this case, it is cost effective to roll the (potentially millions of) individual transactions up to a level more appropriate for the known queries and to store the results in an aggregation, avoiding the need for known queries to perform the aggregation action at query time. Where such aggregations are created, they will be made available to the architecture as a service.
Most organizations suffer from the issue of compatibility between their IT systems. This issue is even more apparent where mergers have occurred. No reason exists why systems from completely different enterprises should be consistent, aligned, or satisfy a unified design.
Figure 9. Collating events for integration (Click on the picture for a larger image)
A BI initiative has to address the problems of disparate systems, data silos, and data incompatibility through data quality initiatives. If the business users do not ultimately trust the information they are presented, they will not use the system.
Improving data quality is not simply a process of fixing problems with individual data elements; it is about designing a business process that improves the management and quality of data as an enterprise resource. In an SO application the data provided by a service is controlled through encapsulation by that service and only published in a specific form that matches the data schema defined for the service.
Figure 10. Upgrading nonenterprise data sources (Click on the picture for a larger image)
Access to the data is achieved through the messages that the service publishes. The service is solely responsible for maintaining data integrity since it is the only mechanism that directly manipulates the data. The schema that defines entities in the service messages and the definition of these messages themselves strongly enhances the data quality aspect of any solution because of the capability to automatically test messages for compliance to the message schema or contract supported by a service.
Let's take a look at a SoBI framework example. It is important that the SoBI framework clearly defines and distinguishes between different types of data and the associated owners of this data, which ensures integrity as the data can be amended only in one place. It also enables the data to be shaped for its appropriate use. Figure 7 shows the high-level data architecture for the SoBI framework. The purpose of this architecture is to highlight how the different types of data will be handled in the solution.
As you can see, the systems of record are integrated into the data warehouse through traditional ETL methods, with the exception that the transformation services normally contained within the ETL process are exposed for reuse by the operational services. This reuse enables information to be integrated into the data warehouse and exposed through service façades in a common schema as the transformation services are shared by both mechanisms.
Figure 11. Views on information (Click on the picture for a larger image)
It is also recognized that not all information within the data warehouse can be exposed through the service interface and that there may still be a small population of users within the organization who will require direct access to the data warehouse to carry out complex or ad hoc analysis. Let's take a look in greater detail at the factors influencing the successful implementation of the SoBI framework.
At a high level, the architectural guidance for the application of the constituent parts of SoBI can be summarized as shown in Table 2. Success factors for the SoBI project include:
- Governance. An SoBI project is unlikely to be successful without an associated organizational change management process in place to support it.
- Enterprise data and SOA strategy. SoBI relies on the existence of a strategic plan for SO applications within the organization and on the recognition of the importance of storing system of record data in enterprise-class stores or applications. For example, the organization doesn't store the master of key business information in spreadsheets.
- Operational versus management reporting. Clear delineation must be made between the operational and management types of reporting that will be required within the SoBI framework. SoBI BI will be the single version of the truth for management reporting. It will be scoped to meet the specific requirements of the business for BI. The supporting data warehouse will contain only the data necessary to support those requirements.
- Data ownership. Although the data warehouse will contain the data gathered from multiple data sources and systems, from a service-oriented perspective the data warehouse should not be considered as the master version or the default store for all data requirements. The owner of this data remains the system of record.
Let's look at these success factors in more detail. Governance is a key success factor in the development of a successful service-oriented solution. It is important that an organization can control and publicize the services, message formats, and structures that are supported to prevent an uncontrolled proliferation of services, messages, and entity definitions in the environment. This is closely tied to the schema definition activities and requires active management by a governance body to ensure that the system adheres to the service-oriented principles.
The key is finding a balance in the level of governance and having enough control to provide a framework for the successful development and deployment of services, but not to the extent of crippling the system's ability to respond in an agile manner to the needs of the business.
In any BI project, a lack of clarity on the role of the data warehouse and the scope of the data to be contained within it can lead to problems. In an ever-lasting design exercise, if the scope of the data warehouse is not managed, the design exercise grows as potential data sources are discovered and the data within those sources has to be consolidated within the data warehouse data model.
The data warehouse model needs to be designed to be extensible to ensure that the data from different assets can be added as a business case can be made for the data. It is not practical to assume that data from every application in the organizational landscape can be included in the design of the first data model. An incremental approach to the design and build of the warehouse is more likely to prove successful.
For enterprise SOA and data strategy the successful application of SoBI relies on the existence of an organizational strategic plan for SO applications and for system of record data to be held in enterprise stores or applications. Where systems and data don't have the capability to support direct integration into the SOA proposed by the SoBI framework, it is assumed that the eventual plan of the enterprise is to migrate these systems and data sources to a platform that would support this level of integration. Examples of the types of systems referred to in this assumption are systems that are heavily loaded and cannot handle the extra burden of exposing services and systems that do not support online querying and rely on periodic batch export.
Some of the business-critical information is held in stores or applications that are not appropriate for data of this commercial value—for example, Microsoft Excel spreadsheets that hold master versions of data making the spreadsheet the system of record. There must be a strategic direction to move toward the point where such data is held in an enterprise- strength store or application and for the documents to become views to this data rather than the master sources. The goal is to move from documents as systems of record to documents as views on data held in systems of record.
Clear delineation must be made between the operational and management types of reporting that will be required within the SoBI framework. The traditional view of the data warehouse as the single source for all corporate information needs does not hold within the SoBI framework. The objective of the SoBI framework is to uphold the principal of BI as the "single version of the truth" but also to maintain the integrity of the systems of record as owners of the corporate data.
An operational report is likely to meet one of these criteria: requires live access to data; is required for the operational management of the business (for example, information on an individual transaction); doesn't require historical data for comparisons; and doesn't require summarized data (aggregated data except for the basic total). A management report is usually defined as:
- Requiring access to timely but not immediate data and typically requires summarized data to be presented over a predetermined historical time frame (for example, month on month metric comparisons by asset).
- Presenting the user with the capability to explore the presented data at will to quickly research potential performance problems (for example, drilling into the data). The report can highlight areas of failure based on conditional formatting.
- Being defined to only notify the user when a problem occurs (exception reporting).
- Assisting in the forecasting of business performance.
- Assisting in the underlying goals of the individual and company (for example, production efficiency, up selling, and product profit maximization).
Although the data warehouse will contain the data gathered from multiple data sources and systems, from a service-oriented perspective the data warehouse should not be considered as the master version. The owner of this data remains the system of record.
SO clearly discriminates between two different types of data, namely, the data that is held within the service and the data that the service exposes to the applications or other services. Data inside is the data that the service utilizes to perform the operations that it provides. This information is entirely private to the service and is never exposed directly. Data outside is the data that is published by the service and used to exchange information and requests with the clients of the service. This data is defined explicitly in terms of an organizational schema.
The application that is the owner of the data (also referred to as the system of record) is ultimately responsible for maintaining its own data; therefore, to enable other applications to request changes to the data the owning application has to expose the request update functionality through its service interface. (See Resources for more information on these two types of data.)
When working with data integration, SoBI will be flexible enough to work with these key integration scenarios:
- Data volumes, which are ad-hoc single transactions or large-volume, bulk data loads
- Integration with third-party packaged applications and proprietary database schemas
- Database consolidation
- Integration of legacy systems, relational, nonrelational, structured, and semistructured data sources
- Support for Web services and integration with messaging middleware
When working with BI, SoBI will aim to satisfy several requirements. Businesses need to collect and aggregate information from disparate sources within the organization and to be able to share this information in an open manner with a diverse estate of applications in different parts of the organization, without first knowing in detail how the information will be consumed. Businesses also need to isolate users from the underlying data format and structure, instead focusing on the business meaning of the data within the organization. The consumers of the information are concerned primarily with the semantics of the data rather than the syntax of the data. Different parts of the business need to be able to share a common language when describing themselves, and businesses need to publish data more widely within the organization, differentiating between data publication and data for analysis.
The publication of defined pieces of information such as KPIs, or metrics through open mechanisms such as Web services, enables this information to be consumable easily within the organization without recourse to specialized applications. The publication of data for analysis will still be a key part of the services provided by the data warehouse, but this tends to target a more limited audience in the organization with access to the specialized applications necessary for data analysis. One of the goals of SoBI is to enable the open publication of information to a wider audience of users, applications, and other services in the organization.
Another goal for the SoBI framework is to take a pragmatic approach to working with systems that cannot be integrated directly into the architecture—for example, those that cannot support extra load or those that are not scalable enough to support direct integration.
Figure 12. Unsuitable service-interface approach (Click on the picture for a larger image)
With this in mind, the framework defines a set of scenarios (or patterns) describing categories of typical source systems that the authors have encountered, along with a recommended outline approach for integrating each category of system. It is envisaged that this collection of integration patterns will evolve and grow as more diverse systems are integrated into a SoBI solution. These patterns help address one of the key service-oriented priorities, namely, the ability to be able to support replacement of applications with minimal impact to the enterprise. The SoBI framework describes how these systems can be abstracted in a way that ensures that there is minimum disruption when the systems are finally replaced.
Figure 13. BI implementation replacement (Click on the picture for a larger image)
We've so far framed SoBI in terms of an ideal world scenario in which the various systems of record are able to participate in the SoBI architecture in a service-oriented manner. In a real project environment, it is likely that there will be a number of constraints that impact our ability to implement pure SoBI. We have identified several categories of constraint, which will be discussed shortly, and for examples in each case we identified a pattern, which, when used in conjunction with the SoBI principal of enterprise SOA and data strategy, ensures that the pragmatic SoBI implementation stays within the guiding principles of the SoBI framework.
An ideal situation is one in which the application is scalable and flexible enough to support the exposing of services and events to the service bus, either directly or through a thin service façade (see Figure 8). This category of application is also capable of supporting the export of data to the data warehouse, as required.
One source system-type constraint is real-time/transactional processing. Some applications will contain information that is interesting from an analysis perspective and from a real-time perspective. Such an application can be service enabled by creating a service façade that exposes the services of the application to the organization, and that fires events when "interesting" changes or updates are made. In this scenario, the applications that are interested in the events from this application can subscribe to the published events. The subscribers will include a data warehouse agent that collates the events for integration with the data warehouse (see Figure 9).
Non- and semistructured systems are another constraint. In any complex environment there are likely to be a number of data sources that contain information that must be consumed by the solution but that are held in semistructured or unstructured formats, such as spreadsheets and document-management systems. With these kinds of systems it is important to structure the information prior to integration into the data warehouse, which will be achieved in one of several ways. One way is to apply structure to the data store. For example, the provision of structured and change-controlled templates for spreadsheets and documents, such that the information can be accurately and reliably extracted from the document, will involve business process change. Another way is to impose structure on the data read from the store, which is inherently difficult as it relies on making assumptions about the semantics of the existing structure of the data source and relying on this never changing. If this assumption holds, programmatic extraction can be achieved.
We are assuming that nonenterprise data sources will eventually be upgraded to more directly support the services that they provide (see Figure 10). It is important to change documents from information sources into views on information (see Figure 11).
Now let's look at source system limitation constraints. For heavily loaded systems there will be occasions when there are data sources or systems that contain data that cannot be interrogated in real time because of operational or technology constraints. Consider these examples: a heavily utilized system may not be able to support the addition of a service interface that processes a new set of queries on a frequent basis; an information source that does not support concurrent access, such as data held in a spreadsheet; and ad hoc or real-time access to data within a production system is considered to be a risk to the effective day-to-day running of the business-critical system.
In these scenarios the tactical solution is to cache the data in a system that can then provide a defined and published interface to the data or service. This cache allows applications and services to have access to the most up-to-date information possible through an exposed service. This approach provides a means of scaling a data source or application to meet the requirements of the business in a way that gives a clear decoupling between the data or application and the service provided. This decoupling is important for future development when the application or data can be moved to a more scalable solution and provide the service directly, or when the application can be enhanced to support the service directly. One caveat is there will be occasions when the type of data that is required from the system is purely of a BI nature, such as large-scale data export. In these scenarios the service-interface approach will not be suitable (see Figure 12).
We are assuming that the heavily loaded systems will eventually be upgraded to more directly support the services that they provide.
One of the principles of SoBI is that the organization must put in place a strategic plan for service-oriented applications and for system of record data to be held in enterprise stores or applications. This plan will inevitably mean the replacement of some of the systems of record in the existing landscape, and in some cases these are likely to be replaced after the implementation of the BI project. Therefore, it is vital that an approach is defined that can support these systems in the twilight years of their lives and ease the process of replacement when the successor system goes live (see Figure 13). This approach will need to produce an architecture that gives balance between enabling an easy transition to the new data source while not committing the project team to a large development effort that is ultimately thrown away.
Sean Gordon is an architect in Microsoft's enterprise systems strategy and architecture consulting practice and is based in the company's Scotland office.
Robert Grigg is a managing consultant and enterprise architect at Conchango, U.K.
Michael Horne is a managing consultant and business intelligence architect at Conchango, U.K.
Simon Thurman is an architect evangelist in the developer group at Microsoft Developer Network, U.K.
"Data on the Outside vs. Data on the Inside,"
Microsoft Developer Network (MSDN) White Paper, Pat Helland, Microsoft Corporation
"Data Warehousing Lessons Learned: Trends in Data Quality,"
Lou Agosta, column published in DM Review Magazine (February 2005)
"Information as a Service: Service-Oriented Information Integration,"
Ronald Schmelzer, Zapthink
"Information Bridge Framework: Bringing SOA to the Desktop in Office Applications,"
Ricard Roma i Dalfo, The Architecture Journal 4, (Microsoft Corporation, 2004)
"Over Half of Data Warehouse Projects Doomed"
Robert Jaques, Gartner, (February 2005)
"Service Orientation and Its Role in Your Connected Systems Strategy"
A paper providing an overview of Microsoft's vision for service orientation and service-oriented architecture in enterprise computing. (Microsoft Corporation, 2004)
"Service-Oriented Architecture: Considerations for Agile Systems,"
Lawrence Wilkes and Richard Veryard, CBDI Forum, The Architecture Journal 2 (Microsoft Corporation, 2004)
"Service-Oriented Architecture: Implementation Challenges,"
Easwaran G. Nadhan, Principal, EDS The Architecture Journal 2, (Microsoft Corporation, 2004)
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal website.