Taming the Beast: Integrating with Legacy Mainframe Applications
Summary: Distributed midrange technology must not only coexist with, but must also integrate with and leverage, mainframe assets. (9 printed pages)
"Program—A set of instructions, given to the
computer, describing the sequence of steps the computer performs in order to
accomplish a specific task. The task must be specific, such as balancing your
checkbook or editing your text. A general task, such as working for world
peace, is something we can all do, but not something we can currently write
programs to do."
–Unix User's Manual, Supplementary Documents
A long, long time ago, I was working in the R & D division of a major global financial institution. Microsoft Internet Explorer 4.0 had just been released, and we were discovering cool things that could be done with DHTML, ActiveX Controls, and DirectX. My group was working on creating a kiosk-based banking solution, in which people could walk in; chat with a banker over video; fill out a form; open an account; make a deposit in the account; and leave with a valid, working ATM card—all within 15 minutes. We did mock-ups and trial runs. We built the entire front end, stored data that was collected from forms into a Microsoft SQL Server database, and used that to emboss test ATM cards. We wowed the business with our demos.
Everything was going great, until someone asked, "How are you going to get this thing working for real?" At that point, all conversation died. Jaws dropped. An uncomfortable silence took over. The problem was that the account-opening, customer-management, and accounts-receivable platforms were all on IBM 390 mainframe systems. No one on our team knew anything about mainframes. Our prototypes were built without any requirements around sending data to mainframe applications. We had spent a significant amount of money building this prototype, and we knew that if we could not get this working, we were at serious risk of losing our jobs.
Finally, our manager stepped in. He ventured, "This is a proof-of-concept. The plumbing is not all hooked up. As R&D, we have proved that this concept can be done. Now, we will have to work with the project teams to hook up the plumbing, and get this thing talking to the mainframe systems. That is a lot of grunt work, but not rocket science. You will have our estimates by the end of the week."
Some adroit verbal footwork saved the day. However, we had a lot of work to do. After the demo, we had a team meeting. For the next three months, all leaves were cancelled; we were told that we would have to work evenings and weekends.
For our team, the big problem was a total lack of knowledge and awareness of mainframes. To remedy that, a flurry of activity began. Action teams were formed. Consultants were hired. Experts from different parts of the country flew in. The overall design was divided and carved out to different teams composed of people with different backgrounds: distributed, mainframe, and enterprise application integration (EAI).
Slowly, a picture of the problem emerged. We (midrange systems) knew when to send data to mainframes. The mainframe experts knew what to do with that data. The missing piece was how data would get to the mainframe applications, and how that data would come back to the midrange systems. Adding to the problem was the fact that large chunks of activity on the mainframe systems happened in end-of-day batches, while our requirement was to issue an ATM card in 15 minutes.
It is surprising how many times otherwise well-planned projects end up being less than successful, due to the lack of consideration of integration issues. It is even more surprising how many such issues involve the integration of mainframe systems with midrange environments. Considering the large mainframe asset base that enterprises have, every architect at some point has to develop solutions that involve mainframe systems.
Given the pressure to deliver that entire project teams—and, by association, architects—feel, mainframe integration often gets the short shrift during project conceptualization and design, which results in costly rework (or even redesign) during integration testing. However, this need not necessarily be the state of affairs. Mainframe integration is like world peace in the preceding quote: a general task that does not automatically happen; it must be planned for. With a little consideration up front, mainframe integration can be made as streamlined as the integration of distributed components—without rework or headaches.
Architecting solutions to enterprise-level requirements in current times requires engaging a variety of components that are deployed in a wide spectrum of environments. In most enterprises, mainframes are workhorse systems that run the majority of business transactions. On the other hand, customer interfaces through Web sites and IVRs (telephone systems), customer-service applications, enterprise resource planning (ERP), customer-relations management (CRM), supply-chain management (SCM) applications, and business-to-business (B2B) interactions are usually on distributed systems.
Any (conscious) activity that is performed to allow data transfer and logic invocation between distributed and mainframe systems is mainframe integration. This article will discuss only IBM mainframes that run a derivative of the MVS operating system.
There are a lot of companies that built and sold mainframe machines. However, IBM mainframes are currently dominant, with approximately 90 percent of the market share. The operating system that is most prevalent on IBM mainframes is MVS or a descendant (OS/390 or z/OS). For a good introductory article on mainframes, check out the Further Study section.
There is some ambiguity over whether IBM systems that run an OS like AS/400 or TPF are mainframe systems, or whether they should be considered midrange systems. These systems will not be explicitly discussed, although techniques that we discuss in this article might be applicable to these systems, in some cases.
In the hype and debate about mainframe and distributed computing, it is easy to forget that mainframe computers are just another computing environment. They are neither easy nor difficult, compared to distributed systems; they just follow a different philosophy. It is also important to remember that mainframe machines are not dinosaurs. Almost every technology that is available on midrange systems is available on mainframes (including XML parsers and Web services). Each mainframe deployment is unique, and features that are available on one environment might not be available on another. That is why this article will focus mostly on basic technology.
Overall Solution Architecture
To begin the process of integration with mainframe applications, it is necessary to look at the integration from the viewpoint of the overall solution architecture. The following process might be adopted for analysis, design, and delivery of integration interfaces.
Identifying Interaction Points and Interfaces
In the overall solution architecture, you should identify the interaction points or interfaces between distributed systems and mainframes. This can be performed at a very early stage, when you are evolving the conceptual solution architecture. This will provide an understanding of the extent of the integration activity that is involved, and will enable you to estimate and focus appropriately on the integration process early on.
It is important to keep these interfaces unidirectional, as much as possible. If an interface is bidirectional in nature, it might be split into two unidirectional interfaces.
For each interface that is identified, you should assign attributes to these interfaces as early on as possible. In some cases, the value of these attributes might not be known; those decisions might not have yet been made. That is okay; this exercise might have an input into that decision process. Some of these attributes can be:
· Direction of interaction (that is originating and recipient systems).
· Originating-system OS.
· Originating-system application platform.
· Originating-system development environment.
· Originating application type (online or batch).
· Recipient-system OS.
· Recipient-system application platform.
· Recipient-system development environment.
· Recipient application type (online or batch).
· Nature of data that is interchanged:
· Request (non-transactional)/Request (transactional)
· Data volume and usage (high volume at specific times, low volume spread over the day).
· Constraints, if any (such as, this interface has to be MQ-based; or, files must be sent once an hour, on the hour).
· Boundary conditions, if any.
Additional attributes that are important from the perspective of the project can be added to the preceding list. It is important to consult all stakeholders and subject-matter experts (mainframe and distributed), to address appropriate concerns:
· Based on the preceding, it is beneficial to create validation criteria to ensure successful interface development and execution. Special care must be taken to factor-in constraints and boundary conditions.
· During design, for each previously listed interface, and based on the attributes that are captured, appropriate decisions must be taken on how to realize that interface. Again, constraints and boundary conditions must be carefully considered before arriving at the interface design.
· Based on design specs and validation criteria, the interface must be built and tested. Because it usually takes time to build applications, interfaces very often are built and unit-tested last (very often, on the day before testing begins), with the result that interfaces fail and deadlines slip. It is beneficial to perform mock testing of interfaces by using handcrafted artifacts (files, messages, and so on) to ensure that the interface will work long before build finishes.
· The validation criteria should be used to drive integration testing, to ensure that the interfaces are effect and robust.
The other piece of the puzzle is to determine how each interface should be designed. The following is not intended to be a comprehensive list—only a starter kit to get the thought process started. Of course, in certain cases, constraints might automatically define the interaction method. If that is not the case, you might use the following.
If the interface requirement is to send high-volume data at specific points of time, and no response is immediately or synchronously required, this data can be sent as a file. There are different ways to transmit files to mainframe systems. The simplest way (if possible) is to send the file via FTP either programmatically or by using shell scripts (provided that the mainframe has the FTP subsystem and that there are no security concerns). There are additional factors of which to be aware when sending data to mainframe systems, such as record length and type (fixed or variable block).
If FTP is not an option, tools such as XCOM or NDM might be used. Both tools facilitate transfer of files between midrange and mainframe environments. Expert note: These tools can work also on systems network architecture (SNA) or SNA/IP environments. A hybrid approach also works well, if available. In this option, the midrange system sends the file via FTP to an intermediate system (for example, a Windows Server or STRATUS system) that is configured to transmit the file to the mainframe by using an appropriate protocol.
There are some issues with transferring files to mainframe systems. It is notoriously difficult to get reliable status notifications on the success state of the file transfer. Additionally, mainframe systems offer the option to trigger jobs when a file is received, to process the file. If such triggers are set up, aborted file transfers or multiple file transfers into generation data groups (GDGs) might cause issues with unnecessary job scheduling or with the same file being processed multiple times. (GDGs provide the ability on mainframes to store multiple files with the same base file name. A version number is automatically added to the file name as it gets stored. Each file can then be referenced by using the base file name or the individual file name.)
If file-based triggers are used, care must be taken to design and develop explicitly, so as to prevent these situations. Regardless, transferring files to mainframe systems remains a reliable, robust interface mechanism.
In case the mainframe-application interaction cannot be file-based, due to technical or other considerations, high-volume data can also be interchanged by using message queuing (MQ). Tools are available that will perform file-to-queue and queue-to-file operations on both midrange and mainframe systems. Even if tools are not an option, moderately skilled development teams on midrange and mainframe systems should be able to author their own tools or libraries to perform file-to-queue and queue-to-file operations at either end. It is a good idea to split the file or data stream record by record into individual MQ messages, to ensure responsiveness. (Expert note: On the mainframe side, unit-of-work considerations will have to be addressed carefully, to ensure robustness.)
If the requirement is to Web-enable existing green-screen applications or to reuse transactions that have been built with green-screen front ends, screen mapping and/or screen scraping might be a viable option. HACL and (E)HLLAPI application programming interfaces (APIs) and tools allow development of these interfaces in C/C++, Java, Visual Basic, and other languages. HACL and HLLAPI provide a reliable way to reuse existing green-screen applications without a major reengineering effort. They provide the protocol for screen scraping. There are numerous tools that facilitate screen scraping by handling the low-level protocols—allowing the developer to focus on business logic. In such cases, Web forms capture data and send that data to the Web server. At the server side, this data is sent to mainframes by using these APIs.
There are issues in using screen-scraping technologies. It is not possible to change these transactions without changing the underlying screens. If existing screen-based behavior must be modified, however, and the screens themselves cannot be modified, due to constraints, screen scraping might not be a viable option. Also of concern are scalability (the number of concurrent connections) and extensibility.
Connectors and Bridges
Mainframes have myriad transaction-processing systems. CICS (Customer Information Control System) is a transaction server that runs primarily on mainframes and supports thousands of transactions per second or more. CICS applications can be written in COBOL, C, C++, REXX, Java, IBM assembly language, and more. (For more information on CICS, please see the second reference in the Further Study section.)
IBM Information Management System (IMS) is a combined hierarchical database and information-management system that has extensive transaction-processing capability that was originally developed for the Apollo space program. The database-related IMS capabilities are called IMS DB and have a legacy of four decades of evolution and growth. IMS is also a robust transaction manager (also known as IMS TM or IMS DC). IMS TM typically uses IMS DB or DB2 as the back-end database. (For more information on IMS, please see the Further Study section.) A vast majority of mainframe systems use IMS TM or CICS as the transaction-processing environment of choice.
In case the requirements are for the interface to be transactional (that is, to execute transactions at the other end), there are myriad options that are based on the nature of the mainframe transactions to be called. IMS or CICS transactions are relatively easily exposed over MQ by using the open transaction manager access (OTMA) bridge for IMS transactions, or the MQ/CICS bridge for CICS transactions.
In such cases, data can be interchanged in copybook format. Later versions of CICS and IMS also support Web services. (For IMS, Web services support is provided by using an IMS connector). Issues that are involved in using connectors and bridges for transactions concern the time that is required to configure and set up the initial environment, and the effort that is required to ensure data-element mapping from the source to destination formats. IBM's MQSI/WMQI servers provide an easy way to map between source and destination formats, as well as message integration services.
If the interface is not to IMS/CICS transactions but to custom-built programs, there are myriad options—from creating wrapper transactions in IMS or CICS, to writing helper programs that accept requests over an interface, pass the requests to the target program, and then pass the results to the calling system back over the interface.
In the case of packaged solutions on distributed systems (for example, SAP, Siebel, and PeopleSoft), adapters are available to integrate seamlessly with mainframe systems. Adapters simplify the work that is required on midrange systems to interface with mainframes. However, adapters are susceptible to configuration problems and, in some cases, issues of scalability and robustness.
Important considerations that should govern the decision of interface options for interfaces are:
· Fire and forget versus request/response. That is, is a response expected back from the interface?
· Synchronous versus asynchronous. That is, is the response expected back in the same sequence flow as the request, or can the response arrive outside of the request sequence flow?
· Interface characteristics:
· Is the interface transactional?
· Is it a two-phase transaction?
· What are the transaction-recovery requirements?
· What are the consequences of a failed transaction?
Back to our story: After about two months of rework, we finally came up with a working design that involved a combination of these techniques: file transfers, MQ messages, screen scraping, and creation of online transactions from existing batch jobs. It took us six more months to build a fully working system in the test environment. We did get the kinks out of the system, and everything finally worked. We built a couple of functional prototypes and rolled it out for pilot testing with a sample population. The results from the sampling studies were encouraging. However, due to cost overruns and additional rollout expenses, the project was stopped.
To sum up this article, let's look at some of the key takeaways from our discussion.
If you are new to mainframes, the one thing to remember is that mainframes are just computers, albeit with a different philosophy. Mainframe computing is neither hard nor easy; it is just different. Mainframes are not legacy dinosaurs; almost every midrange technology is available on mainframes.
Almost every architect will have to work with mainframes and/or mainframe integration, at some point. There is a huge repository of mainframe code that is not going to be ported to midrange anytime soon. You must ensure that you keep integration in mind as you plan your work. Successful integration is neither automatic nor assured; it is the result of hard work.
There are many different ways to integrate with mainframes. You must analyze your requirements and design judiciously—balancing cost, environment, and performance considerations. Other things to consider are interaction type, synchronization requirements, and transaction type. Ensure that you adequately assess risks and have a risk-mitigation plan in place. Remember:
"The most likely way for the world to be destroyed,
most experts agree, is by accident. That's where we come in; we're computer
professionals. We cause accidents."
–Nathaniel Borenstein (1957–)
· Mainframe environments (operating systems, transaction processors, databases, and so on) have been evolving for over four decades and are still going strong. On the other hand, midrange systems usually go through significant paradigm shifts every three to five years. Why do you think that is the case?
· If you were architecting a mission-critical system (say, NASA's moon-base system), would you go for a full-distributed system, a mainframe-based system, or a hybrid approach? Why? If you selected a hybrid approach, what functionality would you keep on the distributed side, and what would you put on the mainframes? What interfacing challenges do you foresee in architecting a three-component messaging system between earth, the International Space Station, and the moon base?
· Imagine that you are architecting a network of ATMs for a bank that will run on Microsoft Windows XP. The core consumer-banking system, however, is on the mainframe and uses IMS. How would you design the ATMs to interface with the mainframe system? What if the network were disrupted—say, by floods, broken cables, or some other disaster? Would you still be able to function? What if one ATM (or even a range of ATMs) is compromised?
· Crigler, Rob. "Screen Mapping: Web-Enabling Legacy Systems." eAI Journal. January 2002. [Accessed January 24, 2007.]
· Gilliam, Robert. "IMS and IMS Tools: Integrating Your Infrastructure." IBM Web site. [Accessed January 24, 2007.]
· IBM Corporation. "IMS Connect Strategy: Providing IMS Connectivity for a Millennium." IBM Web site. [Accessed January 24, 2007.]
· Lotter, Ron. "Using JMS and WebSphere Application Server to Interact with CICS over the MQ/CICS Bridge." IBM Web site. November 2005. [Accessed January 24, 2007.]
· Various. "IBM Information Management System." Wikipedia: The Free Encyclopedia. January 4, 2007. [Accessed January 24, 2007.]
· Various. "CICS." Wikipedia: The Free Encyclopedia. January 24, 2007. [Accessed January 24, 2007.]
· Various. "Mainframe computer." Wikipedia: The Free Encyclopedia. January 23, 2007. [Accessed January 24, 2007.]
Subhajit Bhattacherjee is a software architect of 12 years. He currently works as a principal architect for a world leader in express and logistics in the Phoenix, AZ, region.
This article was published in Skyscrapr, an online resource provided by Microsoft. To learn more about architecture and the architectural perspective, please visit skyscrapr.net.