September 2011

Volume 26 Number 09

Cutting Edge - Software Disasters: Recovery and Prevention Strategies

By Dino Esposito | September 2011

Dino EspositoAs emphatic as it may sound, our lives depend on software. As users of various services ourselves, we know very well that software working as expected may really save our day. But software is getting more and more complex because it has to model the complexity of real-world processes. IT managers, architects and developers must cope with that.

In this article, I’ll review practices that help fix a deteriorated system and show you patterns that may prevent a system from growing in an uncontrolled fashion. The term “big ball of mud” (BBM) was created years ago to refer to a system that’s largely unstructured and padded with hidden dependencies between parts, with a lot of data and code duplication and an unclear identification of layers and concerns—a spaghetti code jungle. To read more about the BBM, I recommend an excellent piece of work by Brian Foote and Joseph Yoder from the University of Illinois at Urbana-Champaign, which you can download from https://bit.ly/nfPe2Y. In this paper, the authors don’t condemn a BBM as the worst-ever problem; they just recommend that architects and developers be ready to face the BBM risk and learn how to keep it under control.

The Dynamics of Software Systems

Some 30 years ago, in the beginning of the software era, applications were far easier to write than today. We had no need for GUIs. We didn’t have to worry much about aesthetics, distribution needs, scalability issues or deployment concerns. And, last but certainly not least, we had much less business logic to implement.

At the time, software development was taken seriously and involved the best minds. Quite amazingly, in the early 1990s we already had most of the software development and architecture principles and patterns laid out. Arguably, very little has been “invented” since. Principles such as separation of concerns and dependency inversion, paradigms such as object-oriented programming (OOP) and aspect-orientation, and practices such as design for testability weren’t developed in recent years. Many of them are being rediscovered and applied by today’s architects and developers, but they have existed for decades.

But why have those principles and practices been lying in a sort of limbo for years? There may be many reasons for this, but a common developer refrain is, “It works, so why should I improve it (and pay the cost of wasting that extra time)?”

The Different Dynamics of the Java and Microsoft Worlds

In the mid-1990s, the advent of object-orientation and languages such as Java (more than C++) prompted many companies to restructure their software, moving larger and larger blocks of business logic out of databases and mainframes. Java developers, specifically in the enterprise space, had to deal with more complexity than that found in the rapid application development (RAD) typically associated with Visual Basic programming. It isn’t surprising that aspect-orientation, dependency-injection frameworks (such as Spring) and object/relational mappers (such as Hibernate)—not to mention a long list of programming tools (such as NUnit, Ant, Javadoc, Maven and so on)—originated in the Java world. They were tools created by developers for developers to smooth the impact of complexity.

At the same time, in the Visual Basic world, RAD was all the rage. Visual Basic helped companies build nice front ends on top of stored procedures when not using mainframe code. The advent of Web services created an extra façade and gave mainframe code the much nicer name of “legacy services.”

The OOP versus RAD debate over the past 15 years was a reasonable contention. When Microsoft introduced object-oriented features in Visual Basic, it was really hard to explain to developers why on earth those weird features were so important. And they actually weren’t, given the relatively low level of complexity of those thin applications.

The .NET Revolution

The release of the Microsoft .NET Framework happened at a crucial time. It happened when some of the big companies that earlier opted for a RAD approach had seen enough of the Internet and had back-end systems old enough to justify a rebuild in light of new Internet-related business processes. At the same time, these companies found in the Microsoft platform a reliable, powerful and extensible framework. Of course, it took only a few years for .NET developers to be swamped with the same huge amount of complexity that Java colleagues experienced a decade earlier. Most .NET developers, though, grew with RAD principles and the RAD approach to development.

In a RAD world, you tend to prefer code that just works and build applications by adding blocks that are mostly self-contained. (And when they’re not really self-contained, you make them that way by abusing code and data duplication).

The problem isn’t with RAD as a programming paradigm. The real problem that’s most likely to lead to the notorious BBM is applying RAD (or any other paradigm) without corrections that can keep the growth of the application, and subsequent multiplication of complexity, under control.

Clear Symptoms of a BBM

It seems to be a natural law of programming that every unit of software of any size (be it a class, a layer or an entire system) degrades. In their aforementioned paper, Foote and Yoder call it the “structural erosion” of software. Personally, I like to call it the biodegradability of software. As software units are maintained or changed, they become harder to further maintain or change. Regardless of the name, nowadays it’s just part of the deal; you can’t ignore it, and trying to do so will just cause damage.

Software biodegradability is tightly connected to piecemeal growth of projects. Incorporating a new requirement into an existing system—which was architected without that particular requirement—is always problematic. It doesn’t mean that it can’t, or shouldn’t, be done—it just means that by adding a new requirement, you’re changing the context. A single change doesn’t usually have a dramatic impact on architecture, but when individual changes occur frequently, the context of the system changes over time and morphs into something that probably requires a different architecture. This process is also referred to as requirements churn.

Likewise, piecemeal growth is part of the deal, and not being ready to cope with that is one of the deadly sins of modern software architecture. By adding new requirements one at a time without reconsidering the system as a whole at each step, you create the ideal conditions for a BBM.

What common symptoms unequivocally tell you that you’re going to have a bit too much mud in the gears of your system? A BBM doesn’t get formed overnight and isn’t that big in the beginning. Addressing these symptoms, which I’ll describe, will help prevent it from growing too big.

The first alarm bell rings when you make a change in a class and end up breaking the code in another, apparently unrelated class. This is the nasty effect of rigid software, which is characterized by some level of resistance to change, which ultimately determines regression.

A second alarm bell rings when you fail in trying to reuse an apparently reusable piece of code. If the same code doesn’t work once it’s moved to another project, the most likely causes are hard-to-find dependencies or tightly coupled classes. These are also the primary causes of software rigidity.

Tight coupling is beneficial because it helps you write code faster, and that code will likely run faster. It doesn’t, however, make the code maintainable. In a project that grows piecemeal (like the vast majority of today’s projects), maintainability is by far the primary software attribute you want to take into account.

Finally, a third alarm bell rings when you need to apply a fix to a function but don’t feel confident in applying an ideal fix (or are unable to do so), so you resort to yet another workaround.

Any system can happily survive one occurrence (or even a few) of these symptoms. Their underlying mechanics are quite perverse, though. If you overlook one of these symptoms, you may incur the risk of making the system more convoluted.

For example, let’s consider the second symptom (sometimes referred to as immobility). You have to add a new feature and you feel quite confident you can slightly adapt and reuse an existing function. You try and it doesn’t work because the class you hoped would be reusable is, in reality, tightly connected to others. The alternative you have is importing a much larger set of functions in multiple modules. In such cases, a common (but naïve) solution is duplicating some code without ever attempting a cleaner reuse of classes. Duplication of code fixes the problem temporarily, but just grows the BBM.

Possible Causes of a BBM

It’s rare for a single developer to create a BBM. Rather, a BBM has many causes, and often the primary cause has to be researched outside the current level of development. Demanding managers, as well as customers who don’t really know what they want, convey to developers unclear and ambiguous information. Unless the team has plenty of experience in general software development and in the specific domain, this automatically leads to arbitrary choices successively fixed through compromises and workarounds. The net effect is that the overall architecture is weakened at its foundation.

Everybody involved in a software project can do a lot to avoid a BBM and to smooth its dramatic impact. Let’s examine the fundamental steps to take when you find yourself immersed in a BBM.

A Disaster Recovery Strategy

When you face a BBM, the idealistic thing to do is simply rewrite the application based on reviewed requirements and new architectural choices. But a complete rewrite is never an option you’re allowed to consider. How would you survive the mud?

Foote and Yoder use an evocative image to introduce the only reasonable option you have in these cases: Sweep rubbish under the rug and go on with your own life. But I think I can summarize a recovery strategy for a software disaster in three steps. The first step is stopping any new development. The second step is isolating sore points to layers (when not tiered) and arranging tools to measure any regression on those points. Finally, you pick up each of those isolated problem layers and, with extreme care, try to refactor them to a cleaner and more loosely coupled architecture.

The second after you stop development on a muddy project, you start thinking of a bunch of relevant tests. Immersed in a BBM, the last thing you think of are classic unit tests. Relevant tests in this context are sort of integration tests that span multiple layers and sometimes tiers. These tests are slow to run in the first place, but—much more importantly—they may be slow to write. You need to isolate (probably large) chunks of behavior and hide them behind a façade that can be commanded through tests. Writing a façade may not be easy, but it’s usually at least possible. It’s mostly a matter of time and work. These tests are a big help for your next steps because they provide you with an automated tool to measure regression as you proceed with the final step. The final step consists of painstaking refactoring work where the first issue addressed is tight coupling.

Planning a Strategy to Prevent a BBM

Prevention is always preferable to treatment. There are three cardinal virtues of a team that may prevent a BBM stalemate, which I’ll discuss later. Given that most projects these days suffer from a high level of requirements churn, piecemeal growth is a fact, not merely a possibility. To prevent a BBM, you must have an effective strategy to cope with piecemeal growth of functionality.

The mantra of maintainable software claims that perfect modern software serves the needs of today and is flexible enough to accommodate future requirements. The first virtue of the three mentioned earlier is domain experience. When you can’t have a real domain expert on your team, at least you need someone with a deep understanding of the domain—which likely makes you the new domain expert. Domain experience takes you to sensible judgment about features that most likely will be required and requested. So, you can easily plan ahead while keeping the YAGNI (You Ain’t Gonna Need It) principle clearly in mind.

The class-responsibility-collaboration (CRC) cards approach helped me to better understand the mechanics of domains I encountered for the first time. I see CRC cards more as a way to teach yourself the domain, than a way to come up with a proper design. Design, on the other hand, is proper only if validated by the actual customer. Use-cases—or, even better, a prototype—are smarter options.

The prototype can be a problem because customers may like it so much that they force you to extend it instead of starting from scratch with a new design. Prototypes are usually made with throwaway code deliberately written to be quick and, as such, devoid of principles and patterns. You just don’t want to start with throwaway code.

The second virtue is becoming a better developer/architect by learning sane principles of software development and common best practices. The challenge here is all in learning how to do simple things correctly by default. Don’t be afraid of interface-based programming and dependency injection. Validate each class against common OOP pitfalls. Ensure correctness via code and static analysis. If you’re not sure about how a feature works, make it testable and write tests. Keep your code lean and mean, with most methods no longer than a few lines (with due exceptions, of course). If these practices become part of your tool chest, the BBM stays a bit farther away from your projects.

The third virtue is about understanding the lifespan of the project. Not every project has to build a system that lasts for decades. Short-lived projects don’t require the same design care. You can go faster on them and put care into making them work before making them right. In software, complexity must be controlled where it really exists, not created where it doesn’t exist and isn’t supposed to be. The danger, however, is when the initial belief that the project had a short time span is invalidated by surprising success and demand, and the creation of a market. If the subsequent release happens too slowly, you run the risk of a competitor snatching the market—or you run the risk of creating a BBM due to really tight (and desperate) business demands.

Software Project Disaster Recovery

The BBM is a common anti-pattern, but, to some extent, it’s an attribute that can be applied to nearly every software project. It originates from using the wrong tools to tame complexity. In this article, I used an expression—disaster recovery—that’s popular in IT. The mantra of disaster recovery experts, however, can be applied to software projects too: Dollars spent in prevention are worth more than dollars spent in recovery. Rush to your manager’s office today!


Dino Esposito is the author of “Programming Microsoft ASP.NET MVC” (Microsoft Press, 2010) and coauthor of “Microsoft .NET: Architecting Applications for the Enterprise” (Microsoft Press, 2008). Based in Italy, Esposito is a frequent speaker at industry events worldwide. Follow him on Twitter at twitter.com/despos.

Thanks to the following technical expert for reviewing this article: Mircea Trofin