Building Distributed Applications with Message Queuing Middleware

 

Peter Houston
Microsoft Corporation

March 1998

Summary: Discusses the challenges encountered in building distributed applications and describes the benefits offered by message queuing middleware (MQM) products such as Microsoft® Message Queue Server (MSMQ).

Contents

Introduction
The Fundamental Challenges
Profound Change Is Coming
MQM and Distributed Applications
What Are the Downsides?
Conclusions
For More Information

Introduction

The vast majority of online business critical applications today are monolithic. While there are many complicated ways in which you can define "monolithic," in a simple sense an application is monolithic when changes to any part of the application can be made and managed in a small number of places. For example, a terminal-based CISC application is monolithic because a developer can change the look and behavior of the application by making all updates on the mainframe.

Surprisingly, most client/server applications—in particular, those based on any of the popular two-tiered remote data access paradigms, such as SQL*Net or ODBC—are also monolithic, in the sense that:

  • The applications that run on the client are usually kept on a shared network file system.
  • Stored procedures, when used, must run within the shared database.
  • Administrators update applications by replacing the shared executable files on the network, or by updating a stored procedure in the central database.
  • Users see the changes the next time they access the application.

Applications only cease to be monolithic when changes require administrators to make updates in multiple locations and keep changes synchronized. Examples of "distributed" applications, therefore, include:

  • Single applications with individual components that run on many different machines.
  • "Virtual" applications that are created by integrating multiple monolithic applications.

That said, it is easier to define distributed applications than to create them. In reality, distributed architectures are still used only in a small fraction of all deployed applications. Why? Monolithic mainframe-based applications scale well enough to meet the needs of most centralized MIS/IT organizations; equally monolithic two-tiered client/server architectures also scale well enough to meet the needs of most departments and divisions. More important, the two primary reasons why more distributed applications have not been (successfully) deployed are that:

  • Such applications are hard to develop and maintain.

There have been reasonable alternatives:The Increasing Opportunity Costs

One consequence of creating mostly monolithic applications is that companies have been forced to adapt to the limitations of their computing systems as opposed to the other way around. Avoiding distributed architectures, therefore, incurs an opportunity cost. And, the excuse that "distributed applications are too hard to build and deploy" is becoming less and less acceptable to management, even when one considers the challenges that companies face.

The Fundamental Challenges

Most conventional communication technologies require sending and receiving applications to be:

  • Online at the same time.
  • Able to communicate with each other at the same time over a network.

Senders and receivers also need to have awareness of each others' program-to-program calling interfaces. This forces any interface changes made to one application to be propagated to the other application(s).

Most important, in order to preserve data integrity, sending and receiving applications usually have to use "distributed transactions" to ensure that changes to either applications' data are acknowledged by both applications or rolled back. Yet, the reality is that:

  • Applications do not always run at the same time.
  • Networks, especially wide-area networks, are not always available and reliable.
  • Changes to applications in one domain of ownership that require changes to applications in other domains are frequently impractical for the technical (and political) reasons just described.
  • Distributed transactions that span domains (and WAN connections) can have a significant impact on application availability and performance.

To consider the last point in more detail, distributed transactions work very well in LAN environments and when there are small numbers of machines included in a unit of work. Yet, there can be significant challenges when there are more than two or three machines and/or a WAN involved. With distributed transactions, the need to protect data integrity means that receivers typically grant locks on data to the calling application; they only release those locks when the caller commits the transaction. Until the caller releases the locks, the owning application cannot allow access to the locked data—even to its own users. If the network connection to the calling application is lost, or the caller experiences other troubles and cannot resolve the outcome of the transaction promptly, delays in releasing locks will occur. Furthermore, the issues just identified compound and multiply. The more machines and applications there are participating in a distributed transaction, the greater the chance that failures will occur.

In most cases, "low tech" approaches (for example, file transfer, batch processing, and so on) have historically worked well enough to satisfy business needs. The compelling reasons to endure the cost and complexity of installing fast, reliable networks, or to engineer applications that can deal with unreliable communications lines and periodic application failures, have—until recently—been too few to warrant the investment.

Profound Change Is Coming

Many profound changes are forcing organizations to rethink and broaden their views on distributed applications:

  • The once per night frequency of batch/file-transfer approaches is no longer timely enough for many applications. This is particularly apparent in supply chain management where competitive advantages are increasingly coming from near-real time data collection and propagation.
  • A new style of business event-based application is emerging where activities in one domain—such as a debit to inventory—must cause some number of other applications in other domains (from replenishment applications to modeling spreadsheets) to perform a related action.
  • Mobile computing is quickly becoming a way of life. Unfortunately, its fundamental properties are incompatible with centralized architectures and tightly coupled communication techniques.

In fact, the ability to deliver reliable distributed applications has become a significant competitive differentiator in many industries and an operating requirement in others. To only to maintain parity in their industries, businesses need to move beyond their comfortably familiar monolithic applications.

The Role of Message Queuing Middleware

Many developers are finding solutions in Message Queuing Middle (MQM) products. These are emerging from several major vendors including Microsoft (Microsoft Message Queue Server) and IBM (MQSeries). Technically, MQM provides reliable, asynchronous, and loosely coupled communication services. Philosophically, MQM represents the realization by major software vendors of the need for ubiquitous message queue-based communication services.

MQM can be successful where other forms of communication could not because it satisfies four important conditions:

  • No simultaneous connection is required between sender and receiver.
  • There are extremely strong request and response delivery guarantees even when communication does not occur simultaneously between sender and receiver.
  • Requests and responses can be translated and reformatted en route between senders and receivers.
  • The business models behind major MQM products are designed to promote adoption by the independent software vendors (ISVs) that build the majority of packaged applications.

In fact, for any alternative to MQM to work, it must satisfy the same conditions.

Communicating via MQM

With MQM, applications communicate with each other as a series of messages. While in transit between senders and receivers, MQM providers keep messages in holding areas called queues—hence the name "message queuing middleware." Queues protect messages from being lost in transit and provide a place for receivers to look for messages when they are ready.

Applications make requests by sending messages to queues associated with the intended receiver. If senders expect responses in return, they usually include the name of a response queue (which the sender must create in advance) in all requests that they make to the receiver.

MQM offers a number of benefits to developers:

  • Applications can use MQM providers to send messages and continue processing regardless of whether the receiving application is running or reachable over the network. (This is one of the two primary reasons that MQM is loosely coupled—connections are not required to communicate.) The receiver may be unreachable because of a network problem, or be naturally disconnected, as in the case of mobile users who only connect periodically to the network.
  • Applications may be unavailable because they have failed, or because they only run during certain hours. When the network becomes available (or the receiving application is ready to process requests) MQM providers will deliver any waiting messages.
  • MQM providers use well understood techniques such as disk-based logging and error detection/correction protocols to make sure that messages do not get lost in transit, delivered out of order, or delivered more than once. In other words, MQM provides the level of reliability required by mission-critical applications. (Arguably, this makes MQM the "reliable, loosely coupled" approach to communication.)
  • MQM providers can also route messages efficiently around failed machines and network bottlenecks; administrators can configure redundant communications paths to ensure availability.

Perhaps most important, messages typically encapsulate requests fully and do not require shared state between sender and receiver. (This is the other primary reason why MQM is loosely coupled.) Developers can use MQM providers, along with protocol and message translators, to bridge between dissimilar application architectures. As long as the sending application can produce a message using one MQM provider—and the receiver can accept a message with another MQM provider—it is a straightforward process (for the first time) to convert between wire protocols and message formats.

MQM and Distributed Applications

The best way to understand the benefits of MQM may be to examine MQM in the context of a series of scenarios:

  • Store-and-forward communication
  • Defensive communication
  • Concurrent execution
  • Journaled communication
  • Connectionless communication.

Store-and-Forward Communication

MQM enables applications to send requests to other applications that are not expected to be running or reachable at the same time. Many applications run only at night, but must receive requests from applications that run only during the day. Equally, mobile users may work with their applications all day on laptop computers (while disconnected from networks) but wish to dial in only in the evening.

By sending requests via MQM products, applications are assured that messages will be delivered as soon as network connections:

  • Become available.
  • Receiving applications begin processing.

Defensive Communication

In environments connected by local area networks, communication between applications is usually reliable. Nevertheless, any communications failure, no matter how infrequent, can cause serious problems (and WANs are notoriously more prone to failure than LANs). For example, stockbrokers can face millions of dollars in losses if an order entry application loses (or duplicates) even a single order. Shop floor automation can experience serious problems if data from collection points fail to reach processing applications.

By sending requests as MQM messages, applications will:

  • Be protected against communication losses when networks fail.
  • Be tolerant of normal peaks and valleys of demand.
  • Demonstrate excellent performance when networks are working properly.

Concurrent Execution

One of the challenges of using tightly coupled communication technology is making requests to more than one receiving application at a time. By definition, requestors using tightly coupled mechanisms must wait for the receiver to return a response before they can make a request to a different receiver. Methods used by developers to issue multiple synchronous calls at once are available, albeit requiring sophisticated (and expensive) programming techniques, such as using threads.

With MQM, applications are able to:

  • Send requests to many different receivers without waiting for responses.
  • Wait for the receivers to process the requests in parallel.
  • Process results when all of the response messages have arrived, or whenever is convenient.

A variation on the concurrent execution theme is often found when an application needs to make one or more requests and then immediately move on to other work. The sending application may not require immediate responses—because it may process responses at a later time or even delegate response checking to an entirely different application.

Using MQM, applications can send their requests as messages and move on immediately to other tasks. This style of communication—often called "fire-and-forget" message queuing—is truly difficult to implement with communications technologies that require the sending thread to wait for a response. With MQM, no special programming techniques are required. Development costs—and complexity—are contained.

Journaled Communication

Most mission critical environments require the ability to create journals of all communications activity within an application. Journals contain precise records that can be used by administrators for logging and audit purposes. Journals are also useful for error recovery; the journals are used to restore system state by replaying all events that occurred after a given point in time against a known starting point.

In distributed, network-based applications, journaling is particularly difficult because most communications mechanisms do not save any record of their activity after they complete a request. State is kept only for the duration of the round trip between sender and receiver. If an error occurs later in a component of a distributed application, recovery procedures rapidly becomes complex.

Because MQM messages represent encapsulated requests for services, MQM products are able to offer message journaling as a selectable option. When journaling is enabled for a given queue or message, MQM products automatically make a copy of each message for the associated journal queue.

Having such a precise record of each request facilitates logging, auditing, and recovery in network-based applications. It adds one dimension that previously was singularly lacking.

Connectionless Communication

Communication is said to be connection-oriented when an application must:

  • Direct a request at a given instance of a receiver.
  • Wait for a response.
  • Share state information between the two applications for the duration of a call.

For example, when a request causes a receiver to update a particular piece of data (such as a bank account balance) the receiver shares state with the sender:

  • The receiver uses data in the request to access the desired account.
  • The sender must know the outcome of the request to act accordingly.

For many reasons, connection-oriented communication is not always practical. For example, the ultimate receiver of a message may not be known in advance. This may be because the initial receiver decides that it cannot process the request and forwards it on to another application. In other cases, such as in publish and subscribe environments, it may not be possible for senders to know the identities of all interested receivers in advance. Some receivers may also be offline at the time a message is sent and no connection is possible.

With MQM, receiving applications can forward messages to other applications simply by resending the same message to a different destination (queue), or by sending the same message several times to any number of other receivers. This is regardless of whether or not they are currently running.

Connection-oriented communication can also create locking and serialization problems in multi-user environments. For example, most connection-oriented mechanisms require the receiver to process all concurrent requests in parallel. Because state is being shared with requestors, receivers must implement logic to protect data from being accessed or modified on behalf of more than one request at a time.

With MQM, all requests are encapsulated as messages. Receivers can choose to process messages one at a time, or in ways that do not create access conflicts.

What Are the Downsides?

MQM is not a panacea and needs to be used along with other relevant communication technologies. To illustrate the point, note that distributed communication always falls into one of three categories:

  • A response is required immediately and lack of a response will prevent the application from continuing. For example, a developer may want to perform a credit check (and get the result) before placing the order with the warehouse via a message queue.
  • A response is needed within some period of time where the actual time period is usually "business-policy related" and where the maximum time allowable is a business decision as opposed to a technical decision. For example, some applications may be able to tolerate delays of 10 seconds, others may tolerate one-hour or even one-day delays.
  • Assuming that the sending application can trust that the request will be delivered (despite reasonable types of failures), no response is needed at all. An example of this case could be messages containing audit events. As long as the messages will reach the audit application, no confirmation needs to go back to the sender.
  • When the outcome of a request is needed before moving on, using MQM may represent needless overhead and programming complexity. A synchronous, tightly coupled approach adds no constraints that are not already there. That said, MQM may still be valuable when doing request-response communication between dissimilar but message-based applications because of MQM's inherent protocol and state neutrality.
  • MQM also is not suitable for applications that need distributed, synchronous transactions. Banking systems, for example, typically do not update account information unless all components of an operation can synchronously complete successfully. If an all-or-nothing data update guarantee is required, a tightly coupled, two-phase commit approach is still necessary.

Is Message Queuing New?

The simple answer is: No. In certain industries—most notably telecommunications, airlines, and financial services—distributed systems have existed for years. But these solutions were largely developed in-house—regardless of cost and complexity—because of a specific need.

These industries have, therefore, been aware of the benefits of MQM for some time. The downside was that most wrote their own MQM because of a dearth of affordable, industrial strength products. In addition, the sheer cost to develop all the needed critical features—such as guaranteed delivery, message routing, once-only delivery, and so on—meant that most custom-built solutions:

  • Were somewhat primitive or highly specialized.
  • Were (and remain) expensive to maintain.
  • Demand specific internal expertise.

MQM Products Go Mainstream

That is now changing as the software industry has understood the potential and moved to create MQM products. Rather than being organization or application specific, what distinguishes the new generation of MQM is that these products are designed (and priced) to appeal to volume markets. MSMQ, for example, is being included with every copy of Windows NT.

Most important, this makes MQM available to ISV developers—in an economic sense—for the first time. This makes it possible for ISVs to invest in building MQM-enabled applications. In turn, this makes it possible for companies to purchase off-the-shelf applications that provide the benefits of MQM. Because more than 70 percent of all corporate applications are purchased from ISVs, companies need MQM-enabled applications from ISVs if they are to build the online-coupled applications that encapsulate many important business functions which management demands.

What About DCOM?

Some will, no doubt, ask the question "What about DCOM?" DCOM is a set of communications technologies that is layered over the MS-RPC transport that qualifies as tightly coupled and synchronous.

This does not mean that DCOM cannot be used as a programming paradigm for message queue-based communication. Microsoft has already announced that work is underway to build implementations of DCOM that work over Microsoft Message Queue Server as a selectable transport.

When finished, it will be possible to use DCOM as a programming paradigm for tightly coupled synchronous communication as well as loosely coupled asynchronous communication.

Conclusions

The bottom line is that until there is a movement toward MQM-enabled applications, the goal of cost-effective applications will remain beyond reach. For companies that want to build or buy distributed applications now, there are several steps to take:

  • Demand that application developers add MQM-enabled interfaces to stovepipe applications to facilitate integration.
  • Require new application development to exploit MQM as a fundamental infrastructure (which, in turn, provides MQM-enabled interfaces).

Purchase ISV application products that have some form of MQM interface. (Make the assumption that no application should exist in isolation forever.)

For More Information

For the latest information on Windows NT Server, check out our Web site at https://www.microsoft.com/ntserver/ or the Windows NT Server Forum on MSN™, The Microsoft Network (GO WORD: MSNTS).

© 1998 Microsoft Corporation. All rights reserved.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Microsoft, the BackOffice logo, MSN, Windows, and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

Other product or company names mentioned herein may be the trademarks of their respective owners.