Data Replication as an Enterprise SOA Antipattern
by Tom Fuller and Shawn Morgan
Summary: As applications are envisioned and delivered throughout any organization, the desire to use data replication as a "quick-fix" integration strategy often wins favor. This path of least resistance breeds redundancy and inconsistency. Regularly applying this antipattern within an enterprise dilutes the original long-term goals of a service-oriented architecture (SOA). However, depending on the context in which it surfaces, data replication can be viewed either positively or negatively. To deliver to a service-oriented strategy successfully, enterprise architects need to draw from the successes and failures of other architectures. Architecture patterns, when discovered and documented, can provide the best technique for managing the onerous job of influencing change in your enterprise. We'll use both an antipattern and a pattern to describe how data replication can affect your enterprise architecture.
Data Replication: An SOA Architectural Antipattern
Blurring Lines of Control
Cognizant of Change
Data Replication as a Pattern
Document Best Practices
About the Authors
The drive toward service-orientation is still in its infancy. For SOA to achieve the lofty goals visualized by the industry, architects will need to rely on documented best practices. This is where patterns and antipatterns will help bridge the gap between conceptualization and reality. Patterns represent a proven repeatable strategy for achieving expected results. Software architecture patterns and antipatterns provide documented examples of successes and failures in attempting to apply those strategies in IT. Through successful identification of architecture abstractions, SOA will begin to find implementation strategies that ensure successful delivery.
We'll describe why data replication, when used as a foundational pattern in your enterprise SOA, can lead to a complicated and costly architecture and provide you with a refactoring strategy to move from data replication (antipattern) to direct services (pattern). Then we'll offer a brief explanation of where data replication can be used to successfully solve some specific problem domains (replication as a pattern) without compromising other enterprise architecture initiatives. Architects will take away from this discussion a strategy for facilitating the shift away from architectural malfeasance.
To promote consistency, the software industry has settled on a template for documenting design patterns. This template, however, seemed insufficient to describe architectural antipatterns and their eventual refactored solutions. A hybrid collection of pattern sections from various resources were pulled together to help define a structure for the antipattern and pattern described here (see Resources). Table 1 shows the superset of sections and whether or not they are applied to a pattern, antipattern, or both.
Let's begin with a detailed description of how data replication is an antipattern when applied in an SOA, using the antipattern template to explain the context specifics and forces that validate this architecture. An example illustrates the problems when applying this antipattern. In closing, you will see the refactored solution and the benefits that can be seen when the pattern is applied in place of the antipattern.
Context. This antipattern describes a common scenario that architects face when attempting to build enterprise SOA solutions in a distributed environment. In the move from the mainframe environment to the common distributed environment, a shift has occurred in the management of our important master data (the data that drives our enterprise on a daily basis). We decide to build our new distributed applications using a silo model, where an application is defined as a set of business functionality exposed through a user interface (UI), and that business functionality is built within its own context. This decision encourages architectures that favor extreme isolation and unnecessary entity aggregation.
Forces. This antipattern occurs in an SOA because of the comfort that architects and designers have with this model. Even when it is explained as an antipattern within the context of SOA, architects and designers continue to use this antipattern for these reasons:
- The architect is familiar with the data replication strategy. The techniques used for replicating data have been fine tuned since the earliest days of batch-oriented development. This fine tuning creates a comfort level for the architect that has been designing systems with data replication for years, using either file-based transfers or extract, transform, and load (ETL tools). It seems simple enough to continue to use this strategy going forward.
- The architect is concerned about the performance of a service exposed by the master data system. For example, accessing a data store remotely through services may slow down the new system over a solution that uses a local database, which is the one-to-many data scenario.
- Separate services may each have slightly different views of the entity. Combining the different views into an aggregated entity and maintaining service autonomy is seen as too difficult, so the data from the services is replicated into a store, which is the many-to-one data scenario.
- The architect is concerned that the new system's durability could be compromised. For example, if the new system will access the master data through a service that exists on the network, network failures or master data system failures could cause downtime for the new system. Having the data available locally seems to mitigate this concern.
- Applications are built using a silo model. An application is defined as a set of business functionality exposed through a user interface, and that business functionality is built within its own context without an enterprise architect helping to guide the application developers toward the reuse of available services.
- Resources for delivering a solution using data replication are abundant. The skills required to implement a solution using data replication are typically easy to find. Building applications that leverage new or existing services takes a rarer skill set, requiring someone with a strong knowledge of distributed systems, Web services, object orientation, and other modern mechanisms for delivering solutions to your business.
Table 1. Correlating sections with patterns and antipatterns
|Name||A useful short name that can help to describe quickly the pattern/antipattern.||x||x|
|Context||The interesting background information where the pattern/antipattern will be applied.||x||x|
|Forces||There are a set of reasons that this solution is being used. Very often in an antipattern, these reasons are made up of misconceptions or lack of knowledge.||x||x|
|Solution||This section describes the proposed solution to the problem, based on the context and forces documented, and represents the strategy or intelligence of the pattern.||x|
|This section will describe the antipattern or solution to a problem that appears to have benefits but in fact will generate negative consequences when employed.||x|
|Consequences||Consequences are the side effects of the implemented solution. In the case of an antipattern, consequences are always going to be made up of negative impacts that outweigh the benefit of the pattern itself.||x||x|
|Every antipattern should have an inverse that would in fact be a solution or pattern, which may be documented as patterns.||x|
|Benefits||The benefits will be the same for a documented pattern and the refactored solution of an antipattern.||x||x|
|Example||A real-world example of where the pattern/antipattern is applied to highlight/solve a problem.||x||x|
Poor Solution (Antipattern). To accomplish this antipattern, master-data information is replicated from the master-data source to a new database created to expose the functionality of the new application (see Figure 1). This exposure may be accomplished through many different mechanisms, such as the use of an ETL tool or through more basic mechanisms such as a file transfer of the data from one system to the other. Master-data information may also be augmented by the new systems, adding additional entity attributes to the original entity.
Consequences. The consequences of this pattern applied incorrectly within an SOA are not immediately visible. When data is replicated from the owning master-data system, in a fractal way, to many different application databases, your service orientation becomes very hard to manage. Replicated data becomes pervasive across the enterprise. Unclear responsibility for the data blurs the lines of control. Multiple services exist to manipulate the same (or slightly augmented) data. The "business view" of the data becomes disparate.
Figure 1. Proliferation of master data leads to distributed ownership of various permutations (Click on the picture for a larger image)
This unclear responsibility for the data and loss of control over the master data is disconcerting to any architect. As data is proliferated and applications begin to add their own spin to the data the complexity begins to mount. The new fields that are added to the data, which may be important to the enterprise as well, may not get added back to the master-data system. Of course, with the compressed timelines and budgets of most enterprise projects, a project team rarely will take on the extra effort of creating new data elements within the master-data system, exposing those new elements through services extended off of the master-data system, or creating an aggregation services layer (thus replacing the initial master-data service).
They may in fact extend services off of their system that will expose this new master data. Also, they may create new services that mimic some of the functionality of the master-data system to expose the data that they have replicated to their new system. This exposure creates a large headache for the enterprise architects to manage. The architect now needs to prevent new systems from consuming services exposed using replicated data, which leads to confusion within the enterprise architecture of the true ownership of the business object.
Replication also carries with it a huge responsibility on the replicating system to duplicate the business logic of the master-data system. Many times data is massaged by the master-data system before being exposed as a service to other applications. In these cases, the quick-hit method of replicating the data to the new system carries with it the burden of also replicating the business logic used to massage the data. Unless good analysis is done on the master-data system, this logic may never get implemented, or may get implemented incorrectly. An even more detrimental scenario is one in which no business logic surrounding the retrieval of the master data initially exists. The logic is instead added later in a project that needs to upgrade the master-data system. This scenario can be painful for an enterprise that has replicated data many times. Each system must now be analyzed for the changes required to the system to meet the new demands of the enterprise data.
It is often easy to dismiss the consequences of using data replication because, on the surface, they seem superficial. Issues like disk space, reusable assets, and labor reduction seem to be manageable with or without moving toward service orientation. The consequence that overshadows them all is the ability to minimize complexity and deliver solutions that can quickly adapt to change. Data replication adds complexity, brittleness, and inflexibility because of the widespread redundancies it creates in your enterprise architecture.
Let's look at an example. A new application is being built to handle the management of a purchase order. The architecture demands that business services be created for the new system and that the UI, whether a portal or Web-based implementation, consumes those business services.
The business requirements dictate that the system should perform the functions of creation, modification, and viewing of purchase orders. Part of the purchase order is customer information, which is stored in a customer relationship management (CRM) system. The CRM system exposes services to retrieve customer information, but the decision is made by the architect to replicate the customer master data to the purchase order systems based on the forces discussed previously.
A requirement of the new purchase order system is to create an account number for each customer. This new data is attached to the replicated customer data in the PO system's local database. The UI displays customer information through the use of a new Web service created by the narrow vertical purchase order system. Other business services are developed for creation, modification, and viewing of purchase orders that are then consumed by the PO system's UI. Any data manipulation or massaging that the master-data system does when a user accesses its data services must be replicated into the new system. Any changes of that data going forward must also be replicated to the new system, as well as any changes to any business logic to access that data.
Figure 2. The refactored system (Click on the picture for a larger image)
The diagram shown in Figure 2 makes it clear that the data replication results in duplicate services. Each application now has its own view of the customer entity, with its own way to access the data through a service that was created for that purpose. The purchase order system copies and extends the customer data through its own set of services, which effectively creates redundancy and confusion for any future systems looking to consume customer services.
Now, the next application comes along, a system that manages invoicing/accounting. This system also needs the customer data, including the account number. There are now several choices that the system designers have to make concerning customer data. Do they go get the customer data master information from the CRM system and get the customer account number from the PO system, or do they—since the data all exists in the PO system—just go to the PO system to get all of the customer information? In fact, because durability is a big concern (a force in this antipattern), a decision is made to replicate the customer data from the PO system to the invoicing system. Now, three systems have a view of customer data in certain degrees of completeness. Three systems are forced to build any business logic around the access of that data. Each of them expose customer data through services (because we are building a "service-oriented architecture"), and the ability for the enterprise to manage customer data in an agile way continues to break down.
Refactored solution. The benefits of an SOA will only be fully realized by an organization that is cognizant of the mind-set change required to deliver solutions using new architectural patterns. SOA will not deliver the kinds of benefits that are possible until architects realize that only by providing a single point of control over the business services and data can the organization truly become the responsive, flexible, and scalable enterprise that SOA promises. To successfully refactor the solution, you must address the forces that create the antipattern:
- The architect is familiar with the data replication strategy. Use the key performance indicators (KPI) that drive an SOA implementation to build a case to not replicate data unnecessarily. Training on SOA implementations that reduce data replication needs can also mitigate this force.
- The architect is concerned about the performance of a service exposed by the master-data system. Test to validate the performance of an exposed service. Through testing and an understanding of the service-level agreement (SLA) for the new application, many will find that the use of a new or existing service that exposes master data will perform within the SLA demands.
- Separate services may each have slightly different views of the entity. The skills and technologies to combine the different views into an aggregated entity and maintain service autonomy are already well established and can usually be done at run time.
- The architect is concerned that the new system's durability could be compromised. Address the durability of the exposed business services through available infrastructure mechanisms such as clustering.
- Applications are built using a silo model. Many projects require the involvement of an architect with an eye toward new and existing services and the functionality they provide. In fact, applications as we know them today must morph from a simple box on a diagram to an assembly of new and existing services.
- Resources for delivering a solution using data replication are abundant. Currently available patterns, templates, standards, and tools allow true SOA with "mid-level" designers and developers.
Figure 3. Asynchronous communication (Click on the picture for a larger image)
Following a detailed analysis of these issues, you can decide your architectural strategy. The shift toward service orientation requires us to think differently about what are the critical questions when making these types of decisions. Instead of focusing so much time on the architecture of my "application" architects should focus on the architecture of the enterprise services. Services should be built on the system that maintains and manages the data, or aggregating services should be built to provide the data, eliminating the need to replicate the data to another application to provide a service.
The enterprise view of an application needs to fundamentally change. To make SOA work, we must shift from building applications—a stand-alone system that provides user functionality—to building products: an array of features to be delivered to the customer. We should be building services that provide this business functionality. The product is nothing more than a composition or orchestration of one or more services and provides one or more pieces of business functionality. It is the aggregation of solved use cases.
Once this mind-set becomes clear, and these forces have been mitigated, direct access to the line-of-business (LOB) services becomes a reality. The UIs of new systems can call existing services as needed, without any data indirection.
Following a refactoring of the architecture to direct services the ambiguity of data ownership is eliminated. New applications like the Invoice system (see Figure 3) are able to retrieve data from the master systems of record.
Benefits. Architects are constantly called on to evaluate the trade-offs of using a certain pattern within a certain solution. When the enterprise architectural environment attempts to move toward an SOA, for all of the reasons that SOA is an important architecture, the trade-offs of replicating data to enhance the speed of the application or the durability of the application must be evaluated against the benefits of the clarity and simplicity gained from the single-point ownership and governance that a service architecture can bring to your organization.
Additional benefits of using the direct services pattern can be found in the KPI that push us toward an SOA initiative in the first place. Service orientation is the latest attempt to create renewable application deliverables. Extreme cost savings can be found through the reuse of services, both in storage space and labor costs. While data replication can appear to satisfy the functional requirements, the goals of the enterprise architecture vision are not being met.
Another benefit can be found in the efficiencies gained by retrieving only the data you need. Architectures that leverage event-driven data retrieval are sure to be more efficient than antiquated batch-oriented processes. Replicating all of the data during a time-constrained batch window creates costly overhead. Without a mechanism for retrieving the right data at the right time using services, you use additional processing cycles to move data that may have little or no use depending on the activities that depend on that data. The direct services pattern will ensure that you do not perform any wasted processing.
Ultimately, the goal of any enterprise data service is to give as complete a view of an entity as possible without sacrificing the benefits of the enterprise architecture initiatives. In cases where core pieces of that entity are fragmented across multiple systems, the design of the service may require some level of aggregation. However, this doesn't eliminate the option of using the direct services pattern to retrieve those subcomponents of our entity.
An architecture that effectively implements and consumes direct services will minimize the brittleness of the enterprise solution portfolio by eliminating redundancy. Through centralization and reuse, the services built using this pattern will provide the best possible approach for remaining agile in a climate of change.
All enterprises have complicated systems and requirements that require architects to remain pragmatic about the patterns they endorse. In these scenarios an architect must control the widespread adoption of replication patterns through good enterprise governance. If that doesn't happen, the enterprise is at risk of overusing this pattern. Because of its perceived low barrier of entry for developers, designers, and architects, the data replication strategy should only be considered a pattern in rare situations.
Figure 4. Data movement building block (Click on the picture for a larger image)
Context. The context where data replication is a pattern is slightly different than the context where it is an antipattern. It still stems from a distributed model where you wish to have a copy of the master data maintained in a system external to the master-data system. In some situations, you may not be able to avoid replication. Some examples include where network latency is a problem, such as in a WAN; where the master-data system is unreliable or has incompatible maintenance windows; when offline capabilities are a key requirement of the application; and when the risks of not having services is mitigated through the expected usage of the application.
Forces. The forces that influence the decision to make a copy of the master data and may justify the complexity involved in duplicating the data are:
- Data availability on the master-data system doesn't match the requirements for the new system. For example, the master-data system must be taken down for maintenance between certain hours where the new system must be available (and the data is critical to the use cases delivered by the new system).
- The network is unreliable or too slow. For example, on a WAN where the network consistently is losing connection to the master-data system, copying the data may alleviate this problem. Another scenario is that the network is too slow to support direct consumption of the data when required.
- The system is expected to have offline capabilities. This force is often a requirement found in systems that will have very little or no connectivity to the enterprise services that are providing the data.
- The data will be copied without any reusable business logic for the purpose of analysis only. There are scenarios where data would need to be copied to a data warehouse for use in analytical reports. In most cases, this type of data transfer is most effective when an ETL tool is used to simply copy the data once it is deemed ready for archiving.
Solution. One solution to these forces is data replication. Data replication can be performed using an architectural building block known as a data movement building block. The data movement building block consists of a source, movement link, and destination.
Figure 5. A master-master replication scenario (Click on the picture for a larger image)
The diagram shown in Figure 4 displays the basic building block of many other data movement patterns (see Resources). Depending on the solution, you may need multiple data movement building blocks to handle a master-master scenario, where the data can be updated in both the source and destination (see Figure 5).
Benefits. Much of the work described in the patterns discussed previously can be done with tools and nondevelopment resources. This feature is one of the primary benefits of this approach because you will often see a lower initial total cost of ownership (TCO). This benefit doesn't tell the whole story though. Often the long-term maintenance and versioning of these types of tools and strategies are very costly. If the primary focus is time to market and the road map for the solution is relatively short, then this may be a viable pattern with a high degree of upside. Keep in mind you will trade brittleness and possible refactoring costs down the road. Sometimes such refactoring can be done by wrapping a legacy system that doesn't meet the tenets of SOA with an interface that is SOA compliant. In this way, the enterprise architecture can move toward SOA as a whole, without having to refactor every application in its domain.
An unavoidable aspect of innovation in technology is change. Application architects must remain focused on mitigating the risk of change. Using patterns and antipatterns to document best practices has proven to be successful since the earliest OO patterns. When applied in a timely fashion, architectural patterns are the most viable weapon in the battle against change.
The pattern and antipattern discussed here focused on data replication and its role in an enterprise with an SOA initiative. By using a template for pattern description, you can see why this technique can have a positive or negative impact, depending on the context in which it is applied. Using patterns to solve this architecture mismatch illustrates exactly how successful adopting an architecture-driven methodology can be.
As the software industry continues to evolve, the role of architecture and patterns will become even more critical. Patterns like data replication that are common within current architectures may now be detrimental to your enterprise's architectural progression. Organizations that can see the shift from applications to services will be the ones who will make the move much more quickly and see the full potential of SOA.
The seemingly unattainable goals of software industrialization are quickly becoming a reality. Patterns are the building blocks of consistency and agility in enterprise architecture. Treating them as first-class architecture artifacts will help to accelerate their discovery and adoption. In time, an enterprise that creates a reusable library of these patterns will have increased responsiveness to their business sponsors and gained an edge over any competition.
Tom Fuller is CTO and senior SOA consultant at Blue Arch Solutions Inc., an architectural consulting, training, and solution provider in Tampa, FL. Tom was recently given the Microsoft Most Valuable Professional (MVP) award in the Visual Developer Solution Architect discipline. He is also the current president of the Tampa Bay chapter of the International Association of Software Architects (IASA), holds a MCSD.NET certification, and manages a community site dedicated to SOA, Web services, and Windows Communication Foundation (formerly "Indigo"). He has a series of articles published recently on the Active Software Professional Alliance and in SQL Server Standard magazine. Tom speaks at many of the user groups in the southeastern U.S. Visit www.SOApitstop.com for more information, and contact Tom at firstname.lastname@example.org.
Shawn Morgan is CEO and senior architect at Blue Arch Solutions, Inc. Shawn has architected solutions for many Fortune 500 companies, including Federal Express, Beverly Enterprises, and Publix Super Markets. Shawn focuses on enabling enterprises to deliver IT solutions through architecture. Contact Shawn at email@example.com.
Design Patterns: Elements of Reusable Object-Oriented Software, Erich Gamma, et al. (Addison-Wesley Professional, 1995)
Microsoft Developer Network: "Data Patterns: Microsoft Patterns & Practices," Philip Teale, Christopher Etz, Michael Kiel, and Carsten Zeitz (Microsoft Corporation, 2003)
"Principles of Service Design: Service Patterns and Anti-Patterns" (Microsoft Corporation, 2005)
"SOA Antipatterns" (IBM, November, 2005)
Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools, Jack Greenfield, et al. (Wiley & Sons, 2004)
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal website.