Dealing with the "Melted Cheese Effect"
Summary: This is the first in a collaborative series of articles that focus on the design of Web services in such a way that they can be more easily changed in the future. (9 printed pages)
These thoughts are the result of a collaboration, first with Beat Schwegler and our presentation of "The Grey Area of Implementing Services Using Object-Oriented Technologies" at last year's TechEd in Amsterdam; with Michael Regan, who does a lot of fieldwork with large customers; and finally with Christian Weyer, co-founder of thinktecture, who has been providing a tool environment called Web Services Contract First, or WSCF. We differ only in that Christian focuses more on audiences that understand schema, whereas our original effort focused on audiences that eschew schema. (To view Christian's blog, click here.)
In the following articles, we will gradually move from high-level considerations to concrete recommendations. In parallel, we will provide a set of tools to support our proposals on creating, offering, and consuming contracts. We expect these tools to become part of WSCF or a completely new set of tools.
- The second article, contracts, will provide a view on conversations, contracts, and agreements. I will provide my opinion on the contract, code, or schema-first discussion.
- In the third article, I take a closer look at building a solution. I will show some of the unexpected problems you will encounter when reusing schemas and provide some initial suggestions.
- In the fourth article, I will present a proposal for the project and design the structure of a service that is resilient to change.
- The fifth article in the series proposes the matching project and design structure for its counterpart, the client.
After these articles, the plan is less clear. Much will depend on our progress in providing tools and on the feedback we hope to receive.
We do not want to focus on an Internet-scale service interaction, where arbitrary groups may change contracts for arbitrary purposes. Most people and most medium or large organizations we work with want to know how to write solutions within their own organizations that can evolve over time. In many cases, organizations that design services own both endpoints; they own, or have at least influence on, both the client and the service. We want to help these people, the architects and designers in these organizations, to reduce and manage the complexity of change.
In this series of articles, we address the problem of changing complex systems in a service-oriented environment. We will provide guidance on designing and building services in medium and large organizations. We will not attempt to address the complexities of designing schemas and contracts for worldwide consumption. We do not pretend to have the silver bullet, but we think the series provides some useful guidance for designing solutions that are more resilient to change. We will take a high-level look at the problem and drill down to concrete schemas and code. In conjunction with the series, we will publish a few tools that will support our recommendations and we expect these to flow into WSCF.
In the not too distant past, we moved from the "Goto era" with its infamous spaghetti programming, into the object-oriented world of what I call "ravioli programming." One of the great advantages of ravioli programming is that you can change the inside of the ravioli cushions without affecting the outside. However, as soon as the functionality of a ravioli changes, it affects the other raviolis, too. This is because they are drenched in a sauce that keeps them together. I have this image of melted cheese in the sauce that glues them together.
DLL Hell is a good example of the melted cheese effect. Multiple applications depend on the same DLL. With each new version of an application comes a new version of the DLL. All of a sudden, all applications that use the DLL have to be upgraded, as well.
DLL Hell is caused by the inability to deal with changes in ever more complex systems. We can make changes to part of the system, but we cannot manage all the interdependencies between all the parts of the system. All of these parts are changing at different rates and they are deployed at different times on different systems. These other systems don't fare any better, and in the mean time their requirements grow ever more complex.
Let's take a closer look at one of the causes of the melted cheese effect.
In using a class, we keep a reference to an instance of that class. We know the methods, the fields, and the properties of the class, the behavior as well as the data. We cannot just replace that class by another, because we would have to change the code that references that class, as well. We cannot even just change the original class without changing the signatures of the methods, fields, and properties, because we would have to recompile.
If we leave it at that, the effect of such ravioli programming is that when we change one class, we have to change many classes so as to change their implementation, or at least we have to recompile them in order to get the ravioli reconnected properly. This is even worse if we have different components with classes that refer to one another. Reducing the latter is what concerns us most. It doesn't really matter how many classes have to be recompiled within a component; it matters how many components we have to recompile.
In the early days of object-oriented programming, inheritance and abstract classes were often used to reduce dependencies. Although this works to a certain extent, inheritance is not an optimal solution for this problem. Inheritance was intended to solve reuse of implementation. It exposes the internal structure of the class and, unless you want to use multiple inheritance, restricts you to a single hierarchy and thus a single use. COM and later Java and C# (or better: the CLR) introduced the concept of interfaces into the mainstream. Interfaces offer a flexible binding mechanism for use.
Years ago, in a previous job, a marketing manager held a presentation for the management of an "Office" product. He praised the functionality of the product and he praised how great it was compared to the competition. He praised the progress we had made. Just when everybody started glowing under this abundant praise, he remarked that there was only one tiny problem. The product couldn't print.
Nowadays, this problem is unthinkable. Printer drivers behave the same across the industry. One may have a few more features than the other, but essentially, the application doesn't need to know the details of the printer. The printer drivers do not share a common base class. That might have reduced the amount of coding required, but it wouldn't have helped the standardization much. They offer common interfaces. They can be accessed through a common set of methods or messages. The clients are loosely coupled to the printers. The client does not bind to the printer driver directly either. Instead, it uses an object factory. It asks the operating system for the right driver or it goes through the operating system to that driver.
The combination of interfaces and factories facilitate the loose coupling and offer a remedy for the melted cheese effect.
Does Service Orientation Cut the Cheese?
Service Orientation is gaining momentum for good reasons. Service Oriented Architecture and Service Oriented Design try to define black boxes with the argument that you can change any of the services without affecting the others, thus eliminating DLL Hell. This works in theory, but in practice it is a little tricky to get all of these black boxes to communicate. The industry is working hard on solving the interoperability problem. Systems from different vendors, using a variety of technologies can now communicate, and the industry are working hard to make that easier. Especially now that the idea of services and more specifically Web services to address the problem is gaining momentum.
However, the one thing we forgot to do, or at least did not give enough attention to, was looking at dealing with the changing parts of a solution. Organizations, departments, and even individuals create and deploy services. The tools make this very easy to do. Often these services are even designed for interoperability. However, only seldom has there been enough consideration for future change and, if there has been, there was no guidance, no examples, and no standards to support them. This part of the problem did not really change while moving forward from object orientation and DLL Hell to service orientation. Service orientation has just given us more complex and larger systems and with that, the focus changes. New services will be introduced, existing services will change, and those services will be used differently than the designers ever planned for. As soon as the behavior of one of the services changes, other services or other clients have to change as well. They become linked in ways that were not in the original design.
So services, like objects and components, suffer from the melted cheese effect. Although I am unaware of metaphors linking SOA and pastas, the melted cheese effect is the same.
So, while service orientation is good, we can make it even better if we design with the evolution of our solutions in mind. It's not so important whether I need to change one, two, or ten classes. It is important whether I need to change one, two, or ten services, or change one, two, or ten clients.
My point is that it's not so important how much code I need to change as it is how distributed that code is. What is important is how many systems need to be updated and how easy or difficult such an update is. It is important how many combinations of versions have to be supported concurrently and tested together. It is important how many moving parts, like different versions of services, contracts, protocols, and operating systems, there are in a solution, and how easy it is to find the combination that breaks the solution.
This is what I mean when I say that compared to DLL Hell, the basic problem has not changed; only the scale has changed. If we could make and deploy all changes at once, we would not even discuss the problem. At the current scale, however, we cannot make such global changes.
A complex solution in a reasonably sized business consists of a set of interacting services, portals, and other client applications. Each of these will change over time; they will be replaced or new parts will be added to the solution. Change happens, solutions are not static, they evolve and then new solutions will take their place. We can identify a number of moving parts, each of which has its own change rhythm. Each change has its own impact and sometimes this impact is indirect.
Suppose a company wants to change the way it deals with its customers. Instead of one-size-fits-all, the customers are divided into preferred customers, gold customers, and platinum customers. Different prices and conditions apply to each. Following the classical approach, we model the customer as an entity. The systems managing our customer interactions understand and use the customer entity. The change means that the customer entity has, at least, one new field and it means that we have to change the code of some of the services, as well as the code of some of the clients. We have to introduce a method for determining the customer categorization, add this to the user interface, and we have to change the pricing and rebate algorithms. All this implies a change of some of the interactions. The sales as well as the customer relations management (CRM) processes will have to change, perhaps all at once, perhaps in a phased approach.
We have to deal with changes in code, both on the client-side and on the server-side. In this series of articles, we will address some of the aspects that may minimize the amount and distribution of the code changes, but that is only part of the challenge. The deployment of changes poses many challenges. The altered code needs to be tested. What's even more, the altered configuration needs to be tested, too.
Installation is also an issue. On the server-side, this may not be a huge problem. However, clients often are distributed over many laptops that may or may not be online, or even worse, clients outside of the organization often need upgrading. Strategies applied by ClickOnce and the Updater Application Block help us enhance the client application to make it check for updates and then install the new code. This allows us to deploy the code centrally. Still, deploying code is only part of what's needed. Configurations need to change, and solutions are required to clear the cache if the schemas have changed, or worse, if they must upgrade various databases. While the Updater Application Block also provides hooks to execute this on the client, it still needs to be done.
Infrastructure changes, as well. New authentication mechanisms, new firewalls, newer, more reliable, or faster infrastructure become available, locations change. Mergers and acquisitions, sourcing—all of these influence the infrastructure and most tend to make the infrastructure more diverse and more complex. Many of the changes such as these have an influence on the software, as well. Often, changed functionality has an impact on both software and infrastructure.
With any new or changed functionality, interactions between the moving parts will change, as well. The schemas will change, the interfaces will change, the contracts will change, and the processes that use them will change.
The operational environment will change as hardware is being replaced, but also due to new requirements: both business requirements and operational requirements such as security. The solutions must support this change. The solutions must be loosely coupled to the operational environment if such changes are to be manageable. Currently, operational requirements are designed into, and thus become part of, the solution. For example, when authorization is an important aspect of the solution, intimate knowledge of the authentication mechanisms is used inside of the application to accomplish this. Developers explicitly or implicitly use other aspects and knowledge of the operational environment, as well, to create the solution. These examples tend to make change difficult and expensive. They lock down the solution and tightly couple it to the current operational environment.
The physical hardware and whether all components of the solution are located on the same or on different servers greatly impact most solutions today. Having to understand the firewalls, load balancing techniques, switches, routes, SANs, NASes, clusters, and so forth, creates a much more dynamic and complex topography that may be impacted by modifications in the requirements of the solution.
Some of these changes will occur more frequently than others do, and they will likely not occur at the same time, nor is that desired. The upshot is that we have to prepare for independently moving parts. We have to be prepared to change one service and one of its consumers without needing to affect the other parts in the system.
We want to make the parts as autonomous as possible so we can treat all parts as black boxes and make changes locally without a ripple effect into the rest of the system. Absolute loose coupling does exist. No coupling merely means that there is no dependency and thus no interaction (none at all). To build a system that works other than in isolation, you need coupling. Loose coupling is not better than tight coupling. Architecture is design, and design is about making trade-offs. It's a trade-off between the cost of change, the ease of building, the ease of maintenance, the required performance, and probably many other important aspects. This is not about making one bowl of pasta, it's about providing a good recipe to make many bowls of pasta, a recipe with a sauce that tastes good, has just the right stickiness to it, and changes depending on who is coming to dinner.
When designing systems, we have to think about the tightness of the coupling, about how much coupling we need to make things work. For instance, good design keeps a clear distinction between the inside of a component or service and the outside. A large portion of dealing with change is dealing with the transition between information on the outside and information on the inside, as well as with the expected offered behavior or interaction. We need a clear interface as well as behavior. We call this description of behavior and interface a contract. We will go deeper into this in the next article. Now, there will be changes that influence the behavior but do not affect the interface, such as bug fixes. Often they will change behavior, but within the boundaries of the published behavior. Sometimes this is good enough reason for the clients not to want to communicate with the previous version anymore.
The whole SOA concept and many of the design guidelines are around preparing for a change of software, but they are incomplete and we often treat them as dogmas. Loose coupling as a simple mantra cannot be the answer. Loose coupling makes no sense without the anticipation of change.
The whole discussion on loose versus tight coupling is a discussion about bracing for change. When change is more likely, looser coupling becomes more important. When things will not change, looseness of the coupling is less, if at all, important.
Looser coupling means that:
- Change causes less pain.
- Dependencies are known, that they can be managed.
- Change is not too expensive and time consuming.
- Change on both (all) sides can be managed independently of each other.
- Impacts are known and anticipated.
Loose coupling is an investment. It is important to understand and appreciate the effects of change on the systems and to have a measure of the cost to the organization of such change. This will help determine how much investment is acceptable and this in turn helps set the boundaries for the design. The cost of change is made up of the cost per change, as well as the frequency of change. So, for things that change frequently, it is even more important to think about the coupling aspects. In some cases, the changes and effects percolating into the surrounding, dependent software are the larger, and often hidden, costs.
We can think of different aspects of preparing for change:
- Can I modify the software to provide the changed functionality?
- Can I modify some parts of the software without having to change all parts? How local can I keep the changes?
- Can I deploy the changes?
- Can I deploy the changes without taking the system down?
- Can I deploy parts of the changes without breaking the system?
- Can I allow users to determine the update schedule so critical business functions are not impacted by the update?
When you want to change the functionality of a service, you will have to change the consumers, clients, and other interacting services, as well. You may have the fortunate circumstance that you can deploy the changed service as well as all changed consumers at the same time, but that isn't always possible and there may be various reasons:
- You cannot influence all consumers (so, you may be able to change some, but you cannot change all).
- You don't have the time to get all consumers/clients ready in time and you need to deploy the ones that are.
- You don't want to give the functionality to all users at once. You want to run with a "test" group first.
- You have to roll out the changes gradually for whatever reason
In all of these cases, the new version of the consumers has to run side-by-side with the old version. The reverse may also be true, although less common. One consumer may have to communicate with multiple versions of a service, for instance when consolidating information from various subsidiaries.
One of the major goals of versioning is to transition from one stable situation to another. Understanding the effect of versions on the contracts enables us to differentiate the combinations that do and the combinations that do not work together.
I argue that one of the causes of DLL Hell is that clients take a dependency on both the interface and the implementation of the DLLs they use. COM provided us with a clean separation of interface and implementation. The client can take a dependency on the interface, without taking a dependency on the implementation. However, it is much easier for the client programmer to instantiate an object using the implementation than it is using an object factory. For the service (COM object) programmer it is much easier to make an incompatible change and use the same general universal identifier (GUID) of the implementation. In COM, the GUID (endpoint) is often tied to the implementation. Not many developers will provide separate classes for the various interfaces, allowing upgrades by providing additional endpoints without breaking the existing applications.
So, binding to the implementation is the problem, not only for DDLs, but also for services (we've shown that the problem is the same after all). We believe that the solution here lies in properly formatted contracts, and using what we call "contract first design."
To avoid these effects happening in our projects, we will have to take the following measures at the various levels of the design:
- Binds to the contract and
- Uses configuration (a factory) to gain access to the endpoint.
- Provides new contracts and endpoints for breaking changes, ideally while leaving the old ones in place and
- Replaces the implementation without affecting the contract and the endpoint.
Looking at the service as a black box:
- The service offers multiple endpoints, one per contract. An incompatible change can then be offered by adding a new endpoint, offering access to a new contract, while, optionally, leaving the old endpoint in place.
- The client binds to a contract.
However, the client's binding to such an endpoint does not have to be hard-coded. Configuration is a better option. This allows us to move the endpoint and it even allows us to search for clients using a specific endpoint.
Looking closer at the service itself, we can identify components that we may want to replace as we replace the external contracts:
- A component offers multiple interfaces or contracts. We may not often want to offer additional interfaces when offering new functionality. Instead, we may want to offer a new version of the component with some interfaces that are new or changed and some interfaces that have remained the same.
- A client of such a component binds to the contract. That means to the combination of interface and the data classes used by that interface.
- Such a client does not have to bind to both the interface and the implementation. It may bind to the interface in code and to the implementation through configuration. It may use an object factory to decouple from the implementation.
When a service or service component is changed, the clients using the original interfaces do not have to be changed. Only those clients that are affected by the changed functionality have to be changed and recompiled. This reduces the dependencies, reduces the need for recompilation, and most importantly, reduces the number of components that need to be redeployed.
This advice clearly leads to an increase in the number of endpoints. What used to be a problem in the code is shifted to being a problem of managing the proliferation of endpoints. I don't believe that is a bad thing. On the contrary, it is desirable. However, you have to prepare yourself for it. (Companies that specialize in this are systinet and Actional.)
Contracts and their design are obviously key to building a system comprised of loosely coupled services. I will get back to contracts in the next article. However, as we have seen in DLL Hell, we have to be careful how to offer the contract in the service and how to consume it in the client. On the server side, you have to consider offering multiple versions of a contract.
On the client side, you have to make sure to bind to a contract and not to an implementation. Factories offer a good mechanism to do just that.