Export (0) Print
Expand All
This topic has not yet been rated - Rate this topic

Guidelines for Application Integration


Retired Content

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

patterns & practices Developer Center

Defining Your Application Integration Environment

Microsoft Corporation

December 2003

Summary: This chapter outlines some of the important challenges to understanding your business problems and illustrates how your application integration environment can meet those challenges.


Levels of Application Integration

Important Considerations for Application Integration


At a simple level, application integration means helping a series of applications communicate with one another better. To ensure that the integration is both beneficial and feasible, however, you should closely examine the business processes that must be integrated before focusing on the applications themselves. If you understand the business processes that your organization uses, you can map those processes to your application integration requirements. This chapter discusses the different levels at which application integration can occur, and shows which capabilities your application integration environment may require at each level. It also examines some of the issues you are likely to face when creating your application integration environment.

Levels of Application Integration

Application integration can occur at many different levels, including:

  • Presentation level
  • Business process level
  • Data level
  • Communications level

Presentation-level integration can be a very effective way of integrating applications in your environment. Presentation-level integration allows multiple applications to be presented as a single cohesive application, often using existing application user interfaces. This guide does not cover presentation-level integration in any detail.

The following paragraphs discuss the other forms of integration in more detail.

Business Process–Level Integration

Business process–level integration often forms the starting point for defining your application integration environment, because it is most closely related to the business requirements of your organization. Business process–level integration starts with defining your business processes and then specifying the logical integration that corresponds to those processes.

Defining Your Business Processes

Before you can design your application integration environment, it is important to understand your organization in logical terms, removed from the technology that underlies it. At this stage, you are not interested in whether information needs to travel from one computer to another using messaging, using file transfer, or using request/reply. Nor are you particularly interested in what protocols are used, or whether it is necessary to perform semantic mapping between two systems. Those considerations, and others, are necessary later, but first you need to define your business processes.

Business processes consist of a group of logical activities that model the activities in your organization. These activities typically represent the smallest units of logical work that can be done in your organization. A typical example of an activity is Update Inventory, which is invoked when new inventory arrives, or when a customer order is placed.

Business processes usually have a beginning and an ending state, and generally have action-oriented names. They perform work and can be measured while doing so. For example, the business process Process Order might represent a series of activities, including Enter Order, Accept Payment, Schedule Production, and Update Inventory. When the order is processed, it goes from an unprocessed state to a processed state. The order-processing rate can be measured to show the efficiency of the process.

Any applications that are related in a given business process are candidates for integration. The reason for integrating the applications is to help the business process in some way; by making it faster, less error-prone, or less expensive, for example.

As you define your business processes, you should consider the following factors:

  • Place. Can the business process be completed in one location, either for the client or the business? If a change of location is required, the business process cannot be completed in one time interval, and its state must be stored.
  • Time. Can the business process be completed in one interaction with the business, or does it need to wait for some events, meaning that it occurs over multiple time intervals? If it cannot be completed in one interval, for example due to legislation, then state must be stored.

These dimensions define the complexity of the business process. They also drive the complexity of the integration of IT services in terms of whether the IT service is available at that place and time, and whether the service has the necessary business process state information. The complexity of the business process can also be affected by the nature of the applications involved in integration; for example, their maturity, degree of integration, and so on.

Business Process Modeling

Business process modeling consists of defining the logical processes in your environment, identifying the start and end points, and determining the logical path between those points. The models can be generated in text or in flowchart form.

You should use business process modeling to define the details of each of the business processes that occur in your organization. Some of the steps may currently be automated; others may represent manual processes. The more complete your models, the easier it is to design an application integration environment that closely matches the needs of your organization.

Modeling Simple Business Processes

Some business processes are very simple. They consist of a single interaction with a business and are completed in a single time interval. Someone (for example, a customer or partner) or something (an automated request) contacts the business and makes a request. This request may be made through a member of staff, or through a self-service facility. For a typical simple business process, the organization has a process that can handle the request immediately, and the process closes. There is no need to save the state of the business process (although there may be a need to record the occurrence of event and the result). Examples of simple business processes include:

  • Registering to use services on a Web site
  • Requesting an inventory of goods of a particular type
  • Asking for a quotation for a service

Modeling Complex Business Processes

Business processes can be quite complex. They can occur over multiple time intervals and in multiple places, and they can require state information about the business process to be managed. Often the steps include a combination of automated and manual steps. Figure 2.1 shows a model of a sample complex business process.

Click here for larger image

Figure 2.1. Model of a complex business process
Note   The following description does not match the key in Figure 2.1, because the numbers in the figure define order processing steps that humans perform, while the following steps explain the entire business process.

The business process shown in Figure 2.1 consists of the following steps.

  1. Someone (for example, a customer or partner) or something (an automated request) contacts the business and initiates an order for goods either through a self-service mechanism or through an order-entry representative of the business.
  2. The Order Management system captures the order details. The system recognizes that this customer has placed enough orders with the company to qualify for a discount on the goods. The system includes this fact with the order information that it has gathered. The system marks the order in the database as 'order in process'.
  3. The event originator is told that the order is under way and that progress updates will be coming.
  4. A business staff member is aware of the order status (either because he or she is handling the order, or because he or she has received a notification from the external event handler IT service). This staff member initiates a request to check for inventory or delivery. This request is a person-to-person communication that may be initiated by e-mail, written note, or phone. Key details are passed to a staff member in the Inventory Management department.
  5. The Inventory Management staff member checks that enough stock exists to fulfill the order. In this case, it does not, so the staff member issues a service request to a supplier for more stock and a promise of delivery date. This service request may be initiated by e-mail, paper form, or phone. The order details are not changed.
  6. At some point, the supplier replies to Inventory Management. A staff member in the department updates the stock position in the supporting IT system. The staff member checks the orders waiting for inventory, finds the appropriate order, and then notifies the Order Management department that this order can now be fulfilled from stock on the date that the supplier has promised.
  7. A staff member in Order Management updates the department database with this information, and issues a notice to the event originator confirming the order and the terms. Order Management passes the order information to the Billing department and asks Billing to issue the bill, using the new special terms.
  8. Billing recalculates the bill, updates the billing database, and sends the bill to the customer. The bill notifies the customer that the goods will be shipped when payment has been received.
  9. When Billing receives payment, a staff member updates the billing database and then notifies the Logistics department that the order can be shipped.
  10. A staff member in Logistics schedules delivery and notifies the customer of when to expect the goods.

As this example suggests, a complex business process can be relatively difficult to model. Where you encounter problems in modeling a complex business process, you should break down any complex constituent processes into a number of simpler ones. Occasionally, however, the processes are so complex that it is impossible to model them accurately.

Business Processing Standards

One advantage of business processes being entirely logical is that they have no dependence on the underlying technology. This enables you to define business processes according to predefined standards, which has the following benefits:

  • You can interchange business processes more easily between organizations, with trading partners interacting at the process level rather than at the data level. This results in simpler business-to-business interaction.
  • You can recruit individuals who are already familiar with the standards you have adopted.
  • You can deploy software that has built-in support for those standards.

Existing business processing standards represent both the language and the visual representation of those processes. This allows you to define your business processes textually or visually, and then use tools to translate them into the corresponding language representation.

Business processes are most commonly represented by XML-based languages. The two most commonly used languages are Business Process Execution Language for Web Services (BPEL4WS) version 1.1 and Business Process Modeling Language (BPML) version 1.0.

The Business Process Modeling Notation (BPMN) version 1.0 specification, released by the Business Process Management Initiative (BPMI) Notation Working Group, allows both BPEL4WS 1.1 and BPML 1.0 to be represented using common elements. BPMN supports process management by providing a notation that is intuitive to business users and yet is able to represent complex process semantics.

Mapping Business Requirements to Your Application Integration Requirements

After you have determined the business processes that your organization uses, you can begin to examine how they affect your application integration requirements. It is important to realize that a simple business process does not always correspond to simple requirements for application integration. In fact, a simple business process may require complex interaction between multiple applications.

Figure 2.2 shows a simple business process that requires the integration of multiple applications.


Figure 2.2. Simple business process requiring application integration

By modeling each of your business processes, you can define which applications must integrate with one another, and in what way. This enables you to make intelligent decisions about the specifics of your application integration environment.

Business Processing Integration Capabilities

Table 2.1 shows a number of capabilities that are required for effective business processing integration.

Table 2.1: Business Processing Capabilities

Rules ProcessingAllows you to define rules and process them at run time.
Business Transaction ManagementEnsures that if a business activity fails, related activities are canceled.
WorkflowControls the interaction between people and automated processes.
OrchestrationCoordinates the execution of the different activities that make up a business process.
State ManagementMaintains the state in the context of pending applications.
Event ProcessingRecognizes when business events have occurred, logs those events, and determines which business process capability receives the event.
ScheduleGenerates events when a defined time period has elapsed or at a particular time of day. Also tracks processes in which an external event should occur within a specified time frame or by a specified date or time.
Contract ManagementMonitors and processes contractual obligations at run time.

For more information about each of these capabilities, see the Appendix, "Application Integration Capabilities."

Data-Level Integration

You can define how applications communicate with each other at a business process level, but if they cannot understand the data they exchange, they cannot integrate successfully. Because different applications often handle data in different ways, a number of capabilities are required to establish integration at the data layer.

There are two general ways to enable data-level integration:

  • Add logic to enable each application to understand incoming data from other applications.
  • Add logic to enable each application to interpret outgoing data to an intermediate data format and interpret incoming data from that format into a form the application understands.
Note   In each case, the logic may modify the application itself or, more commonly, may place an interface in front of each application.

The problem with the first approach is common to most areas of application integration—a lack of scalability. In most enterprise situations, application integration architecture should instead adopt the second approach, with the data level using an intermediate data language.

Different integration scenarios have different data-level requirements. For example, if you synchronize data between two applications that store customer information, the data may need to be formatted to fit the data format of the target application. However, if the data is being moved into a data warehouse or data mart, the data may not only have to be formatted, but it may also have to be sorted, filtered, or otherwise processed so that it suits the needs of the users.

The nature of the applications you integrate may alter the way in which your data capabilities can function. For example, if you are moving data between applications in real time, the data-level capabilities generally work on one message at a time, and yet they may still need to support a very high volume of transactions per hour. To meet these needs, the data capabilities must normally be small, quick, and multithreaded. Other applications may only need to transfer data in batches (for example, loading a data warehouse every night at midnight). In this case, the data capabilities may not have to be multithreaded, but they may need to be particularly strong at sorting, summarizing, and filtering sets of data.

To support the various complex data integration requirements, your application integration solution must often contain a considerable amount of logic that supports the access, interpretation, and transformation of data. You also need schematic models of enterprise data to describe data attributes. The definition and recognition of schemas enables the validation of data. Descriptions of how data elements may be related or mapped to each other allow for the transformation of data.

Data Integration Capabilities

Table 2.2 shows a number of capabilities that are required for effective data integration.

Table 2.2: Data Integration Capabilities

Data TransformationModifies the appearance or format of the data (without altering its content) so that applications can use it.
Data ValidationDetermines whether data meets the predefined technical criteria.
Data AccessDetermines how data is collected and presented to applications.
Schema DefinitionMaintains predefined schemas.
MappingManages the relationship between applications and determines how the data must be transformed from the source to the target application.
Schema RecognitionChecks that the schema can be read and understood, and that it is well-formed.

For more information about each of these capabilities, see the Appendix, "Application Integration Capabilities."

Communications-Level Integration

Although the forms of integration already discussed are very important, they all depend on integration at the communications level. Unless applications can communicate with each other, you cannot integrate them.

Not all applications are designed to communicate in the same way. Some communicate synchronously, others asynchronously. Some use file transfer to communicate; others interact through messaging or request/reply.

The nature of the applications you are integrating often determines how communications occur between them. Older applications may use file transfer for communications, and some may not be designed to communicate with other applications at all. Newer applications often use message-based communication and may use predefined standards to facilitate communication. Before determining the requirements for your communications-level integration, you need to examine the communication needs for your environment and the existing capabilities of your applications.

Enabling Communication Between Applications

Because many applications are not designed to communicate directly with each other, it is likely that you will need to do some development work to ensure that your applications can be called by other applications.

There are two possible approaches to this problem:

  • Rewrite your application to provide it with an API that other applications can call.
  • Create a communications adaptor that acts as an intermediary between the application and other applications.

Although rewriting an application to add APIs may seem like a more elegant solution than creating a series of bolted-on connectors or adapters, the time, effort, and risk associated with rewriting older applications makes the connector approach a preferred solution in most circumstances. However, for widely distributed and diverse applications, the distribution and management of connectors can generate significant management overhead.

You also need to determine whether the communication between applications will occur directly on a point-to-point basis, or through an intermediary hub. For more information, see "Making Application Integration Scalable" in Chapter 1, "Introduction."

Communications-Level Capabilities

Table 2.3 shows a number of capabilities that are required for effective communications integration.

Table 2.3: Communications Capabilities

Message RoutingControls the routing of messages from the source to the target application.
Message DeliveryDetermines how messages pass from the source application to the target application.
Message QueuingImplements reliable messaging.
Message ForwardingProvides serial message handling.
Message CorrelationEnsures that reply messages are matched to request messages.
Message InstrumentationEnsures that the messages are processed in the order that was originally intended.
AddressingDetermines where messages should be sent.
Transactional DeliveryGroups together messages, sending and receiving them within a transaction.
File TransferMoves files between applications.
Serialization/DeserializationConverts data to and from a flat structure to send it over the network.
Request/ReplyWaits on a request sent to the target until a response is received.
Batching/UnbatchingCollects messages together for transmission and separates them at the destination.
Encode/DecodeEnsures that applications using different encoding methods or code pages can communicate.
Connection ManagementEstablishes a logical connection between applications in a request/reply scenario.

Not all of these capabilities are required for every application integration scenario. This guide discusses how to determine which capabilities are required in your environment.

Note   For a more detailed description of each capability, see the Appendix, "Application Integration Capabilities."

Important Considerations for Application Integration

The specifics of your application integration environment depend on many things, including the complexity of both your business processes and your IT environment. Several considerations, however, are common to many application integration environments. This section discusses these issues.

Note   For more details about the specific capabilities that are required for an application integration environment, see the Appendix, "Application Integration Capabilities."

Using Synchronous and Asynchronous Communication

One important consideration in an application integration environment is how the applications communicate with each other. They may use synchronous or asynchronous communication, or most commonly, a combination of the two.

Often the applications themselves determine the best forms of communication to use. In situations where the application must wait for a reply before it continues processing, synchronous communication is often most appropriate. Asynchronous communication is normally appropriate in situations where the application can continue processing after it sends a message to the target application.

Just considering the needs of the application may be a too simplistic way of determining whether to use synchronous or asynchronous communication, however. Communication may be synchronous at the application level, for example, but the underlying architecture that enables this connectivity may be asynchronous.

Generally, a number of capabilities are required to support synchronous and asynchronous communication. Table 2.4 shows these capabilities.

Table 2.4: Capabilities Used in Synchronous and Asynchronous Communication

Form of communicationCapabilities required
Synchronous communicationRequest/Reply
Connection Management
Asynchronous communicationMessage Routing
Message Forwarding
Message Delivery
Message Queuing
Message Instrumentation
Message Correlation

Asynchronous communication offers a potential performance advantage, because applications are not continually waiting for a response and then timing out when they do not receive one in a timely fashion. Asynchronous communication offers other performance benefits as well. Because asynchronous communication is connectionless, a continuous link is not maintained between each side of a data exchange. This arrangement can provide a small performance advantage over connection-based communications, and may also reduce the load on the network. However, any advantage may be offset, at least in part, by a need to match replies to initial messages at the application level. Using the Message Queuing capability to ensure message delivery can also diminish performance.

Although applications traditionally rely on synchronous communications, applications can maintain the same functionality while taking advantage of the benefits of asynchronous communications. In most cases, you should design your application integration environment to support both synchronous and asynchronous communication, because more often than not, both will be required. However, you should consider adopting an asynchronous model for communications whenever possible.

Increasing Automation

One of the most significant aims of an application integration environment is to reduce the amount of human involvement in integrating applications, and therefore to move increasingly towards fully automated application integration. Of course, in many cases this is not possible. However, it is likely that some of your organization's business processes can be more automated.

In particular, you may have many applications that enable you to perform business processes, but that require a series of manual steps. Manual steps might include interacting with a number of applications consecutively. As an example, if a staff member named Bob orders 100 widgets from the Manufacturing department, the business process might resemble the following.

  1. Bob chooses the widgets in an online catalog.
  2. Bob creates a purchase order for the widgets.
  3. Bob sends the purchase order to his manager for approval.
  4. Bob's manager approves the purchase and sends a copy of the approved purchase order to the Finance department. Finance stores the purchase order as a pending purchase.
  5. Bob sends the purchase order to the Manufacturing department, copying Finance so they know the purchase has been made. Finance stores the purchase order as an outstanding purchase.
  6. Manufacturing sends the widgets, updates its inventory, and invoices Finance for the purchase amount.
  7. Finance transfers the money to the widgets manufacturing department.

The previous example uses a series of applications in a specific order. Much of this processing can be automated so that applications interact with each other. For example, by integrating the online catalog with the application creating the purchase order, you can save Bob a step and save the company some money. Actually, this type of process can be completely automated (with pauses for human approval). Some of the steps are sequential, but others can be performed in parallel (for example, Finance can store the purchase order as an outstanding purchase as Manufacturing sends the widgets). The order of events depends on the overall nature of the business process.

Straight-Through Processing (STP)

Some business transactions do not require any human intervention. The processing for such transactions is known as straight-through processing (STP). The goals of using STP include:

  • Improving customer service by reducing the opportunity for human errors and by speeding up the business process handoffs.
  • Reducing the cost of the business process through automation that results in lower staff costs.
  • Reducing batch processing.

A single business transaction often involves multiple applications working in conjunction with each other, and in this case your application integration environment must provide the logic that would otherwise be provided by people. This is achieved by defining business rules for the transaction to follow.

Figure 2.3 shows straight-through processing occurring with multiple applications.


Figure 2.3 . Straight-through processing with multiple applications

Your decision on how extensively you use STP normally involves a tradeoff between initial costs and benefits. Dramatically reducing the amount of human interaction in your business processes can involve the creation of some very complex business logic and similarly complex IT solutions. You need to determine at what point the development costs exceed the benefits of extra automation.

Business Processing Capabilities Used in STP

Business processing capabilities are frequently used to provide straight-through processing. The only capability that is not generally used is the Workflow capability, because it normally applies to situations that require human interaction with the system. Table 2.5 summarizes the business processing capabilities used in STP.

Table 2.5: Business Processing Capabilities Used in STP

Rules ProcessingNormally used
Business Transaction ManagementNormally used
WorkflowNot used
OrchestrationNormally used
State ManagementNormally used
Event ProcessingNormally used
ScheduleNormally used
Contract ManagementNormally used

Data Capabilities Used in STP

Some data capabilities are essential for STP, but others are used only in specific situations. When the structure and format of data differs across the applications involved in STP, you need the Data Transformation and/or Mapping capabilities. If the flow is between different business domains, but within the same application working with the same data formats, you generally do not need these capabilities.

You may need the Data Access capability if the end-to-end response time of STP is unacceptable. By deploying the Data Access capability, you can ensure that data is staged for faster access, which should improve the overall performance of STP. If the overall response time is acceptable as is, you do not need the Data Access capability.

Table 2.6 summarizes the data capabilities used in STP.

Table 2.6: Data Capabilities Used in STP

Data TransformationSometimes used
Data ValidationNormally used
Data AccessSometimes used
Schema DefinitionNormally used
MappingSometimes used
Schema RecognitionNormally used

Communications Capabilities Used in STP

Some communications capabilities are essential for STP. Others are used only in specific situations, and some should not be used at all. STP usually uses a messaging style of communication throughout the business process. This means that the messaging capabilities are generally required, but capabilities such as Request/Reply, Connection Management, Batching/Unbatching, and File Transfer are not. However, Request/Reply and Connection Management are useful for implementing some work-step invocation and response. File Transfer, while it is not the optimal solution, can be used as an alternative to messaging if your communications environment is based on file transfer.

The Message Forwarding capability enables indirect access to applications. This capability is normally not required in STP, which involves a concrete set of applications that are directly accessed. If you do need access to additional applications, you can make these applications part of the STP flow so that they do not require Message Forwarding.

Transactional Delivery is not always required for STP. Transactional Delivery is necessary in situations where multiple messages must be processed together or not processed at all through the constituent steps of a given STP flow. However, many STP flows do not necessarily involve such grouping of messages. Therefore, you should examine the specifics of your STP environment to determine if you need Transactional Delivery.

Serialization/Deserialization may be required if the structure of the data is very different across the applications that are part of the overall flow. If the flow is between different business domains within the same application working with the same data structures, Serialization/Deserialization is not necessary.

If the applications in the STP flow exist on different operating systems or represent data in different languages, you may require the Encode/Decode capability. Otherwise it is not normally needed.

Table 2.7 summarizes the communications capabilities used in STP.

Table 2.7: Communications Capabilities Used in STP

Message RoutingNormally used
Message DeliveryNormally used
Message QueuingNormally used
Message ForwardingNot used
Message CorrelationNormally used
Message InstrumentationNormally used
AddressingNormally used
Transactional DeliverySometimes used
File TransferSometimes used
Serialization/DeserializationSometimes used
Request/ReplySometimes used
Batching/UnbatchingNot used
Encode/DecodeSometimes used
Connection ManagementSometimes used

Ensuring Data Integrity

In any application integration environment, data is generally passed from one location to another. In some cases, a master writable copy of the data exists, and other applications read from and write to that location. In such cases, your application integration environment needs to support only basic data replication. In other cases, however, multiple writable copies of the data exist. This data may exist in multiple databases, formats, and locations, and on multiple platforms. However, you still must maintain data integrity in each location.

To maintain data integrity in your environment, you should consider the amount and complexity of data that you are storing. You may have multiple writable copies of data in situations where a single master copy is more appropriate. Or maybe the format of your data can be standardized across your organization. Examining the data integrity requirements of your organization before building your application integration environment can save you a lot of money.

As an example of a data integrity problem, imagine that your company maintains employee information in multiple databases. The number of vacation days allowed for your employees exists in a human resources database, but salary information resides in a financial database. Both of these databases contain a record for each employee, but the records are separate. If you want to examine all of the records for an employee named Smith, you must retrieve records from both databases. If Smith marries and changes her name to Jones, you must modify records in each database to ensure that both are correct.

Data Synchronization

You can ensure that data integrity is maintained in your organization by using the data synchronization capabilities of your application integration environment.

Data synchronization is the act of reconciling the contents of two or more databases to bring them into an agreed-upon state of consistency, according to agreed-upon rules. Data synchronization is (at any one time) one-way, because data is changed in one location and is then copied out to other locations that require it. Depending on the requirements, the data may be copied when a particular event occurs, or at a particular time.

In a data synchronization environment, it is not generally practical for each program that changes one instance of common data to change all other instances. Instead, the data synchronization logic normally propagates the changes to the other copies at some later time. This behavior can lead to data concurrency issues. For more information about concurrency, see "Managing Latency" later in this chapter.

Note   Data synchronization between two applications is often required in both directions. However, in these cases synchronization actually consists of two one-way processes rather than one two-way process.

Figure 2.4 illustrates data synchronization between two applications.


Figure 2.4 . Data synchronization between two applications

Business Processing Capabilities Used in Data Synchronization

The Rules Processing and Business Transaction Management capabilities are normally required for data synchronization. Rules Processing is required to validate the data. Business Transaction Management is required because data must be synchronized across all databases or none of them. All of the other business processing capabilities, however, are either sometimes used or not used at all.

The Workflow, Event Processing, and Schedule capabilities may be required, depending on the specifics of your environment. The Workflow capability is used when business users have to choose between duplicate instances of the same entity (for example, multiple addresses for the same person) during the synchronization process. Event Processing is required when synchronization depends on a given event (for example, the sale of a product registered on a point-of-sale device, which initiates a purchase order on the manufacturer's order entry system). The Schedule capability is used when synchronization must occur at a given time.

Table 2.8 summarizes the business processing capabilities used in data synchronization.

Table 2.8: Business Processing Capabilities Used in Data Synchronization

Rules ProcessingNormally used
Business Transaction ManagementNormally used
WorkflowSometimes used
OrchestrationNot used
State ManagementNot used
Event ProcessingSometimes used
ScheduleSometimes used
Contract ManagementNot used

Data Capabilities Used in Data Synchronization

As you would expect, data synchronization uses many of the data capabilities. The only capabilities that you may not need to implement are the Mapping and Data Transformation capabilities. Often, you do not need these capabilities, for example if data is synchronized across relational databases on a single platform that stores enterprise-level data in a consistent structure and format. You should examine the applications, platforms, and data involved in synchronization to determine the need for data transformation and mapping in your environment.

Table 2.9 summarizes the data capabilities used in data synchronization.

Table 2.9: Data Capabilities Used in Data Synchronization

Data TransformationSometimes used
Data ValidationNormally used
Data AccessNormally used
Schema DefinitionNormally used
MappingSometimes used
Schema RecognitionNormally used

Communications Capabilities Used in Data Synchronization

Some communications capabilities are essential for data synchronization. Others are used only in specific situations, and some should not be used at all.

Data synchronization is, by definition, one-way. Even in situations where it appears to be bidirectional, you can think of it as two one-way processes. For this reason, Request/Reply, Connection Management, and Message Correlation normally are not used in data synchronization. Message Instrumentation is not used, because messages do not need to be processed in a predefined sequence. Message Forwarding is not used either, because data synchronization occurs directly with each database instance.

Other messaging capabilities are typically used as a means of communication for data synchronization. File Transfer may also be used for data synchronization. You will often decide whether to use File Transfer based on the capabilities of the applications involved and on the availability of other communications capabilities.

Serialization/Deserialization may be required if the structure of the data is very different in the databases that are being synchronized.

Synchronization usually applies to a given instance of a business entity. However, you may use the Batching/Unbatching capability to update a logical set of data records (for example, customer records and all the associated address records). Or you may use it to batch the records together for performance reasons.

You may require the Encode/Decode capability if the databases to be synchronized exist on different operating systems or represent data in different languages. Otherwise, it is not normally needed.

Table 2.10 summarizes the communications capabilities used in data synchronization.

Table 2.10: Communications Capabilities Used in Data Synchronization

Message RoutingNormally used
Message DeliveryNormally used
Message QueuingNormally used
Message ForwardingNot used
Message CorrelationNot used
Message InstrumentationNot used
AddressingNormally used
Transactional DeliveryNormally used
File TransferSometimes used
Serialization/DeserializationSometimes used
Request/ReplyNot used
Batching/UnbatchingSometimes used
Encode/DecodeSometimes used
Connection ManagementNot used

Managing Latency

If you have multiple sources of data, it will not be identical in all locations at all times. There will be a delay between the time that information is written in one location and the time that those changes are reflected in other locations. Your business requirements will determine what kind of time delay is acceptable.

As an example, imagine that your business provides services to your customers for a monthly fee. However, a customer has a complaint, and as a courtesy, you agree to reduce the cost of your services by 10 percent. This information is fed into your customer relationship management (CRM) database. However, in order for it to be reflected in the bill, the information must be synchronized with the billing database. In this case, the information must be in the billing database before the bills are issued, which could be in several days. However, if billing information is available immediately to the customer on the Web, you may need to ensure that the updates occur within minutes.

Concurrency Issues

Latency in the updating of data can cause significant problems, especially if the same record is updated in multiple locations. You must ensure that when these delays occur the resultant data is consistent and correct.

Consider the following approaches to maintaining data concurrency:

  • Pessimistic concurrency
  • Optimistic concurrency
  • Business actions and journals

The following paragraphs discuss each approach in turn.

Pessimistic Concurrency

Pessimistic concurrency assumes that if one instance of data is updated from multiple sources, the desired result will not occur; therefore, it prevents such attempts. In such cases, each time a user attempts to modify the database, the database is locked until the change is complete. Pessimistic concurrency can be problematic, because a single user, or client, may hold a lock for a significant period of time. It could involve locking resources such as database rows and files, or holding onto resources such as memory and connections. However, pessimistic concurrency may be appropriate where you have complete control over how long resources will be locked.

Optimistic Concurrency

Optimistic concurrency assumes that any particular update will produce the desired result, but provides information to verify the result. When an update request is submitted, the original data is sent along with the changed data. The original data is checked to see if it still matches the current data in its own store (normally the database). If it does, the update is executed; otherwise, the request is denied, producing an optimistic failure. To optimize the process, you can use a time stamp or an update counter in the data, in which case only the time stamp needs to be checked. Optimistic concurrency provides a good mechanism for updating master data that does not change very often, such as a customer's phone number. Optimistic concurrency allows everyone to read; and because updates are rare, the chance of running into an optimistic failure is minimal. Optimistic concurrency is not an efficient method to use in situations where the transactions are likely to fail often.

Business Actions and Journals

For situations in which data changes frequently, one of the most useful methods for achieving concurrency is to use business actions and journals. This approach relies on standard bookkeeping techniques. You simply record changes to the data in separate debit and credit fields.

For example, imagine that your company provides widgets for customers. The company receives two requests that alter the field containing the number of widgets. One request is a return of 20 widgets, and the other is a sale of 20 widgets. If you use business actions and journals, both requests can be received, and the field does not need to be locked.

In fact, this approach does not require the locking of fields even for an order change. For example, suppose that the customer changes an order from 20 widgets to 11 widgets. This could be thought of as a single change, but it could also be thought of as two separate changes—first an order of 20 widgets, and then a return of 9 widgets.

This type of data concurrency approach may involve updating each record on the fly any time a change occurs, or it may involve posting the information into a journal, which is checked for consistency and completeness before all of the postings are executed.

Business actions and journals do not prevent optimistic failures entirely. For example, if an update request uses previously retrieved information (for example, the price of a widget), the retrieved information may be outdated and may require validation within the update operation. Your design should specify what must happen when such an optimistic exception occurs, because you may not always want to notify the user of such an exception.

One other problem with business actions and journals is the potential for a modification to be processed twice. Two orders for 20 widgets is obviously different from one order for 20 widgets. If business actions and journals are not used and the master data is modified directly, the widget inventory field can be changed from 200 widgets to 180 widgets multiple times, and it will make no difference. Therefore, if your application integration environment uses business actions and journals, it must also ensure that each action occurs once and only once.

Data Aggregation

Some data must be merged together before it can be sent to other applications. For example, data may need to be combined from multiple applications, or from a single application over time.

The precise challenges of data aggregation depend on the number of records you are aggregating. Aggregation may take place one record at a time, or you may batch multiple records together to create one output message. Scaling up further, some batch integration solutions require the aggregation of thousands or even millions of records from dozens of applications. The processing requirements of data aggregation increase unpredictably as the number of input records grows to these levels. Consequently, a solution that works effectively for transactional data aggregation may not be able to keep up with batch data aggregation requirements. Therefore, in some cases you may need two separate data aggregation services—one optimized for smaller payloads and high transaction rates and a second service optimized for the batch processing of very large data sets.

Data aggregation can be particularly complex when different types of data arrive at the transformation service at different and unpredictable rates. An example is the merging of video and audio input, which requires synchronization over time. The solution often involves the use of temporary storage for data that must be synchronized with data that has not arrived yet.

Combining Application Functionality

Many IT applications require user input from presentation devices. The interactions are designed to suit human needs and often to support conversations with customers. However, as you move to a more service-oriented architecture, you will find that applications have different requirements, particularly to support semantically rich messages.

In most cases, it does not make economic sense to completely create your new applications from scratch. Often someone has already built some of the functionality of the application for you, and if you know that the functionality works, you can use it to build reliable applications cheaper and more quickly. You can build the new application by adding new programming logic to existing building blocks of functionality.

Application Composition

Application composition describes the functionality required to interpret messages from applications, make calls to other applications, extract relevant information from their replies, and pass the information back to the application that made the original request.

Composed applications have the following attributes:

  • Interaction between components is synchronous and two-way.
  • Interaction between components is fast, because a client application or user is waiting for a response.
  • As data is updated in one component, it is updated in other dependent components.

Figure 2.5 shows an example of a composed application.


Figure 2.5. A composed application

A composed application often provides the same functionality from a business perspective as STP. Which option you choose depends on how costly the implementation is in the short term, the medium term, and the long term.

Business Processing Capabilities Used in Application Composition

The Rules Processing capability is often used in application composition, because composed applications usually consist of reusable application components that contain business rules. Other capabilities may be used depending on the specifics of your environment.

The Business Transaction Management capability is often used to ensure that a complete transaction is successfully executed across all of the components in a composed application, but it is not needed in situations where a composed application does not have direct transactional responsibility (for example, in rules engines). The Workflow capability is required only in situations that involve human interaction. The State Management capability is used when application components have to resume processing after being in a suspended state.

Composed applications do not ordinarily rely on the Orchestration capability for multistep processing because orchestration logic is inherently embedded in the composed application. The Event Processing capability may also be built into composed applications; in fact, it is required for composed applications built out of silo applications, because it provides a mechanism for loosely coupled integration.

Composed applications do not use the Schedule capability because the communication is synchronous and two-way. Nor do they use the Contract Management capability. While contracts can exist between the application components that constitute a composed application, the management of those contracts is not related to the composed applications themselves.

Table 2.11 summarizes the capabilities needed to provide application composition.

Table 2.11: Business Processing Capabilities Used in Application Composition

Rules ProcessingNormally used
Business Transaction ManagementSometimes used
WorkflowSometimes used
OrchestrationSometimes used
State ManagementSometimes used
Event ProcessingSometimes used
ScheduleNot used
Contract ManagementNot used

Data Capabilities Used In Application Composition

In a composed application, there is no guarantee of the quality and state of the inputs received by the individual application components. Therefore, the composed application components must use the Data Validation and Schema Recognition capabilities to validate the inputs they receive.

Other capabilities may be required, depending on the specifics of your environment. Usually, composed applications are built from modular, reusable building blocks of functionality with generic inputs and outputs that conform to the same semantic definition. In such scenarios, the Data Mapping and Data Transformation capabilities are not required. However, if applications are composed across platforms, or if the components are extended into application domains for which they were not originally designed, Data Mapping and/or Data Transformation are likely to be needed.

Schema Definition is required at the entry point of a composed application, and in some cases the individual components need to use the Schema Definition capability. In other cases, the composed applications may be built across multiple reusable components that recognize a single schema. Finally, in situations where you require better end-to-end response times in your composed application, you may need the Data Access capability to stage data for faster access.

Table 2.12 summarizes the data capabilities used in application composition.

Table 2.12: Data Capabilities Used in Application Composition

Data TransformationSometimes used
Data ValidationNormally used
Data AccessSometimes used
Schema DefinitionSometimes used
MappingSometimes used
Schema RecognitionNormally used

Communications Capabilities Used in Application Composition

Because composed applications use synchronous two-way communication, the capabilities used to support this communication are generally required, including Request/Reply and Connection Management. However, as mentioned earlier in this chapter, communication that is synchronous from the perspective of the application may actually be asynchronous at the communications level. In these situations, the Addressing capability is required to directly address the constituent components of the application, and the Message Delivery capability is used to provide a single casting delivery mechanism between the different components of the composed application. The Message Correlation capability is also required to match messages to replies.

Several capabilities are not used at all for application composition. Given the real-time nature of the interactions between the components of a composed application, File Transfer, Batching/Unbatching, and Message Queuing are not used. With no Message Queuing, there is no need for the Transactional Delivery capability. Message Instrumentation is not used, because the entry-point application that invokes the components of a composed application ensures that messages are processed in the proper sequence.

Other capabilities are used only in certain circumstances. For example, Message Routing is usually not required because composed applications ordinarily follow a predefined sequence of steps that use the functionality already available within the application. However, the capability is required in cases where the business needs to modify this sequence in a way that is not supported by the application.

In situations where components of a composed application represent multiple applications, you can use the Message Forwarding capability to ensure that each application receives the information it requires.

The Serialization/Deserialization capability may be required if the structure of the data is very different across the composed application components. Similarly, the Encode/Decode capability may be needed if the composed application consists of components that reside on different operating systems.

Table 2.13 summarizes the communications capabilities used in application composition.

Table 2.13: Communications Capabilities Used in Application Composition

Message RoutingSometimes used
Message DeliveryNormally used
Message QueuingNot used
Message ForwardingSometimes used
Message CorrelationNormally used
Message InstrumentationNot used
AddressingNormally used
Transactional DeliveryNot used
File TransferNot used
Serialization/DeserializationSometimes used
Request/ReplyNormally used
Batching/UnbatchingNot used
Encode/DecodeSometimes used
Connection ManagementNormally used

Managing Transactions

Managing transactions is challenging in any complex business environment because business processes do not consist of isolated activities. Instead, they consist of a series of related, interdependent activities. As you define your application integration environment, you need to include capabilities that help you manage transactions effectively.

Business applications are frequently required to coordinate multiple pieces of work as part of a single business transaction. For example, when you order goods online, your credit card is verified, you are billed, the goods are selected, boxed, and shipped, and inventory must be managed. All of these steps are interrelated, and if one step fails, all corresponding steps must be canceled as well.

Not only do you need to provide Business Transaction Management capabilities at the business process level, you also need to consider what happens at other levels of integration. Messages that pass between applications may also need to be passed as transactions to ensure that the system can roll back, in case one of a group of interrelated messages is not received properly.

Your application integration environment will normally include the following two capabilities to support transaction management:

  • Transactional Delivery. A communications capability that ensures that transactions satisfy Atomicity, Consistency, Isolation, and Durability (ACID) requirements.
  • Business Transaction Management. A business processing capability that ensures that when a transaction fails, the business can react appropriately, either by rolling back to the original state, or by issuing additional compensating transactions.

Providing ACID Transactions

Traditionally, transaction management has been thought of in terms of ensuring that transactions meet the Atomicity, Consistency, Isolation, and Durability (ACID) requirements, which are defined as follows:

  • Atomicity refers to the need to complete all the parts of the transaction or none of them. For example, when a pick list is generated to take a book from the shelf, the inventory quantity for that book should be decreased.
  • Consistency refers to the need to maintain internal consistency among related pieces of data. For example, in a funds transfer application in a bank, a $100 withdrawal from a customer's savings account should be completed successfully only if a corresponding $100 deposit is made in that customer's checking account.
  • Isolation refers to the need to ensure that the activities within one transaction are not mixed with the actions of parallel transactions. For example, if five simultaneous requests for a particular book arrive and the warehouse has only four copies of the book in stock, the five transactions must each run in isolation so that the orders are processed properly. Otherwise, it might be possible for each of the five business transactions to see an inventory level of five books and then try to reduce the number to four.
  • Durability refers to the need to ensure that the results of the transaction are changed only by another valid transaction. The results of the transaction should not be compromised by a power outage or system failure.

When an atomic transaction runs, either all of its steps must be completed successfully or none of them should be completed. In other words, when a transaction failure occurs, the underlying applications and data should be returned to the exact state they were in before the failed transaction started.

Developing an application that satisfies these ACID requirements can be extremely complex, and the resulting application can be hard to scale. To streamline application development and to ensure scalability, software vendors offer transaction management software that is designed to satisfy the ACID requirements. This software generally takes one of two forms:

  • Database management systems. Most enterprise-level database management systems provide transaction-processing features. Multiple applications can update multiple database tables or segments within a single, automatically managed transaction.
  • Transaction processing managers. When data has to be updated on several databases or file systems on several diverse hardware or software platforms, a transaction processing manager can coordinate all of these actions within a single business transaction. This coordination saves you the task of writing an application to manage and coordinate the updates to each of these data sources.

Transaction processing managers generally conform to a set of standards called XA standards, which are governed by the Open Group (www.opengroup.org). In an XA-compliant distributed transaction management environment, a single transaction-processing manager coordinates the actions of multiple XA-compliant resource managers. A database, messaging system, file management system, or any runtime application environment can act as an XA-compliant resource manager as long it is able to perform the required transaction management tasks.

Transaction-processing managers can require significant processing power, high network bandwidth, and good management tools. Performance issues with transaction management are mostly associated with resource locking and communications overhead (for all of the extra communications traffic required to coordinate the highly dispersed transaction components).

Transactional Messaging

Transactional messaging is used when you want to perform several tasks so that all tasks, including non-Message Queuing operations, either all succeed or all fail. When you use transactional messaging, the sending or receiving application has the opportunity to commit to the transaction (when all the operations have succeeded), or to abort the transaction (when at least one of the operations failed) so all changes are rolled back.

In Message Queuing, a single transaction can encompass multiple sending and receiving operations. However, the same message cannot be sent and received within a single transaction. In a case where a transaction consists of sending multiple messages, either all of the messages are transmitted in the order in which they were sent (if the transaction is committed), or none of the messages are transmitted (if the transaction is aborted). If multiple messages are sent to the same queue, Message Queuing guarantees that they arrive in the destination queue exactly once and in the order in which they were sent within the transaction.

Messages that are to be delivered in the order in which they were sent belong to the same message stream. Each message in a stream carries a sequence number, identifying its order in the stream, and a previous number, which equals the sequence number of the previous message in the stream. The message whose sequence number equals the previous number of another message must be present in the destination queue before the message can be delivered. The first message in a stream always carries a previous number equal to zero.

A transaction can also consist of removing multiple messages from a queue. The messages are removed from the queue only if and when the transaction is committed; otherwise, they are returned to the queue and can be subsequently read during another transaction.

Transactional messaging is slower than nontransactional messaging because messages are written to disk more than once and greater computer resources are required. Transactional messages can be sent only to transactional queues on the local computer or a remote computer. When messages are received within a transaction, they can be retrieved only from a local transactional queue. However, messages can be retrieved in nontransactional operations from local and remote transactional queues.

Two-Phase Commit

Satisfying ACID requirements can be problematic if the components of the transaction are dispersed over some combination of multiple databases and transaction processing managers. ACID requirements are even harder to maintain when parts of the transaction are executed under the control of separate operational entities (for example, separate business units of a single enterprise or separate enterprises).

To satisfy ACID requirements in these environments, XA-compliant resource managers can control a two-phase commit. In a two-phase commit, the transaction processing manager first instructs each resource manager to prepare to commit. At that point, each resource manager writes the transaction's data to the database, file, or queue, but retains the data in memory. After all of the participating resource managers have successfully prepared to commit, the transaction processing manager issues a commit instruction. This is the second phase of the two-phase commit.

A two-phase commit is an extremely effective approach to satisfying the ACID requirements when different data stores and messaging systems must be coordinated. However, it is resource-intensive, with a lot of network traffic and a lot of interdependent system locks on pending local transactions. If an application or service is waiting for a reply and cannot continue until the reply is received, this approach can block business processes. A two-phase commit is appropriate for environments where each system is local to each other system; however, more distributed environments, particularly those using asynchronous transactions, require a different approach.

Transaction Compensation

When applications are loosely coupled through asynchronous transactions, a two-phase commit usually becomes too problematic to implement because the atomicity of the transaction is difficult to maintain. In these situations, you can instead adopt an approach where compensating transactions are used to restore order when only part of a transaction is completed successfully.

As an example, consider a customer who is planning a vacation. When arranging the vacation, the customer must ensure availability of air travel, hotel, and rental car for the same week in the same location. If any one of these travel services cannot be arranged for that week in that place, then the entire transaction should be canceled.

For a tightly coupled transaction, the composite application has to contact the airline, hotel, and car rental reservation systems and keep each of them pending until all three can agree on a particular week. Because those activities may tie up transactions in the three reservation systems to an unacceptable degree, a better approach may be to handle this process as three individual transactions. In this case, if one or more actions cannot be completed, the other actions are canceled by executing a compensating transaction. Therefore, if the airline and car rental transactions are completed successfully, but the hotel reservation fails, then the composite application sends messages to the airline and car reservation systems to cancel those reservations. The composite application can then start over and look at availability for a different week or at a different resort.

The management of the logic required to recognize complete and incomplete distributed transactions and to issue compensating transactions is difficult to automate because each scenario can require different compensation strategies. Consequently, the process management capabilities of this integration architecture must readily enable developers to design and implement compensation logic that is matched to the business needs of each transaction.

With transaction compensation, the developer out of necessity takes an optimistic approach to transaction management. The assumption has to be made that the individual parts of the multisystem transaction will all be completed successfully. Many times that optimism is warranted. However, a loosely coupled transaction is not able to satisfy the ACID requirements of a distributed transaction as well as a transaction processing manager is. For example, isolation is problematic because multiple applications may each try to update the same data element within a very short time frame. Therefore, in the resort example, it is entirely possible for each of five travelers to believe that they have reserved the last rental car at the location that they are all visiting.

Also, if a transaction is cancelled before it's completed, it's not possible to have each system simply roll back those parts of the transaction that were successful. Instead, a compensating transaction must be run. For example, if the car is no longer available, a credit transaction has to be sent to the traveler's credit card company to compensate for the deposit that was just processed.

In addition to isolation, consistency is also compromised by transaction compensation. It's often easy to make changes to one system that are not immediately reflected in related systems.

Note   Standards associated with transaction compensation include WS-Transactions, WS-Orchestration, and BPEL4WS. For more information about these standards, see "Understanding Web Services Specifications," in the MSDN Web Services Developer Center (http://msdn.microsoft.com/en-us/library/ms996535.aspx).

Long-Running Transactions

Some business processes, particularly those that involve human interaction, run over an indefinite time period. Traditional management techniques are not useful for these types of transactions, because each transaction holds database locks and resources. Because thousands of business processes can run on a computer at any one time, the number of resources held rapidly becomes impractical.

To get around the problem, these business processes are treated as long-running transactions. Long-running transactions are specifically designed to group collections of actions into atomic units of work that can exist across time, organizations, and applications. A long-running transaction usually contains several nested short-running transactions, the overall grouping (composition) of which is controlled by the long-running transaction.

In a long-running distributed business process, records in a database cannot be locked for extended periods of time, nor can records be locked in databases distributed across organizations. This enables other transactions to see the data being used by the transaction.

A long-running transaction satisfies all of the conventional ACID requirements except isolation. However, where appropriate, a long-running transaction may consist of a series of short-lived transactions which do ensure isolation.

As an example, consider a business process that is initiated when a purchase order request is received. The request is logged to a database and is then sent to the request approver. It might take several days for the approval response to be received, at which point the response is also logged to a database and the purchase order is sent to the supplier. Receiving and logging both the initial request and the response are each composed of multiple actions (receiving and logging).

In this scenario, you can use short-lived transactions to group related actions into a single atomic transaction (receiving a message and logging it to the database). However, you cannot group the receipt of the purchase request message and the receipt of the approval message within a single short-lived transaction, because that would lock rows in the database for indefinite periods. Instead, you use a long-running transaction to group together the two short-lived transactions, which might be separated by a significant time period. Figure 2.6 illustrates this approach.


Figure 2.6. Grouping short-lived transactions into a long-running transaction

When the business process in Figure 2.6 is executed, the purchase request is received and the database is updated in a short-lived transaction. If anything goes wrong at this stage, the Receive PO transaction is aborted and no changes are made; otherwise, the transaction is committed. Then the schedule waits for the arrival of the approval message. When the message arrives, the database is again updated transactionally. However, if anything goes wrong at this stage, the To Supplier transaction will abort automatically. The Receive PO transaction cannot be aborted, though, because it has already been committed. In this event, the first transaction must supply a code that will perform transaction compensation.

Timed Transactions

Timed transactions trigger an abort of a long-running transaction if it has not been completed in a specified amount of time. It is often very difficult to decide in advance how long a business process should take. However, it is possible to make a reasonable estimate of how long a specific action within a business process should take (for example, the arrival of a message).

A timed transaction can group short-lived transactions and wait for the arrival of a message within a specified time period. If the message arrives in time, the timed transaction is committed; otherwise, the timed transaction aborts and causes the short-lived transactions to execute compensation code.


Effective application integration depends on having a good understanding of your environment. After you define the business requirements of your organization and understand how your applications resolve business problems, you can determine the characteristics and capabilities that your application integration environment requires. You must also understand the major challenges that are inherent in effective application integration and develop strategies to deal with those challenges.

Start | Previous | Next

patterns & practices Developer Center

Retired Content

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Did you find this helpful?
(1500 characters remaining)
Thank you for your feedback
© 2014 Microsoft. All rights reserved.