In last month’s column, I mused about how, in software as in life itself, there’s usually a time for an action and also a time for its opposite. This month, I want to discuss the opposite of that thought, by which I mean events for which there is never a time.
I often compare our software industry to the medical industry. A doctor can’t completely control a patient’s outcome. Medical situations vary, risks always exist, stuff happens. But certain patient-harming events should never, ever occur. We know what causes them, we know how to prevent them; therefore, their occurrence always constitutes malpractice. Reading the list of these “never events” (bit.ly/h9RMl8) makes you wince: operating on the wrong patient, or on the wrong part of the correct patient; leaving surgical instruments inside the patient and so on. My all-time favorite good news/bad news joke—“The bad news is that we amputated the wrong leg. The good news is that your other leg is getting better after all”—brutally illustrates the unacceptability of these events. (I know: “Plattski, you are one sick puppy.” It’s been said before.)
We need to adopt this same idea for our software: that certain occurrences are never, ever, acceptable. We need to define these events, publicize them and educate developers about what they are and how to avoid them. And we need to explain to users that they should never have to tolerate this behavior from their software and shouldn’t be asked to.
Here’s my first proposed never event for software. We wish our programs wouldn’t crash, as doctors wish their patients wouldn’t die (and they envy our reset buttons), but neither is going to happen anytime soon. Because we know that our programs will occasionally crash, I say that losing a user’s work in a crash is a never event. Remember how you’d work in Word or Excel for two hours, then up would pop the dreaded Unrecoverable Application Error box and it was all gone? Not acceptable. Ever. No matter what.
I hear lazy geeks objecting. “That’s not our problem, it’s a matter of education. Users just have to save their work every 10 seconds, then they’ll never lose anything.” Balderdash. That’s not the user’s job, any more than it’s the patient’s job to tell the surgeon: “No, you dimwit, it’s my other arm. Are you sure you remember which end of the scalpel to hold?” It’s the surgeon’s job to get the operation right, as it is ours to get the software right.
Because these events should never occur, it’s a big story when they do. Consider respected surgeon David Ring, who performed the wrong procedure on one of his patients at Massachusetts General Hospital. Rather than cover it up, or discuss it only in a closed mortality and morbidity conference, he published his own case in the prestigious New England Journal of Medicine (bit.ly/gzWN9q). The surgical team performed a full failure analysis to find the root cause. (It’s far more complicated than you might think; read the article.) They reviewed protocols and changed some of them: for example, the alcohol prep that washed away the supposedly indelible surgical site markings was discontinued. The world is a better place for this intolerance of unacceptable events, and for the openness in dealing with those that manage to occur.
Our industry needs the same thing. Why was data lost? It shouldn’t have been. Was the disk full? That’s a capacity problem—we know how to solve that. Because some dimwit yanked the plug out of the wall? That’s a durability problem—we know how to solve that, in several different ways at different price points. Because we forgot to check a null pointer? Easily solvable. And so on.
If our profession is ever to take its rightful place as a pillar of society, we need to adopt this idea from another pillar.
What do you think are the never events in software, and how should we prevent them? Use the link at the end of my bio to tell me. As always, readers will be identified only by first names, unless they request otherwise.
David S. Platt teaches Programming .NET at Harvard University Extension School and at companies all over the world. He’s the author of 11 programming books, including “Why Software Sucks” (Addison-Wesley Professional, 2006) and “Introducing Microsoft .NET” (Microsoft Press, 2002). Microsoft named him a Software Legend in 2002. He wonders whether he should tape down two of his daughter’s fingers so she learns how to count in octal. You can contact him at rollthunder.com.
This article has been updated. Thank you!
cbowen, thank you for pointing out that error. It's on my "seldom" list, but not necessarily me "never" list. A correction is in process and will be posted. Hey, it could be worse. Check the Los Angeles Times's story about USC University Hospital translanting a kidney into the wrong patient, at http://www.latimes.com/news/local/la-me-usc-kidney-20110218 ,0,5603801.story
I completely agree that these types of things should never happen under _normal operating circumstances_. However, I think that is a very important qualification. For example, I am not aware of any surgeries performed where the hospital/operating room was under attack. Malware can put good software in exactly this situation. Sure, computers have their versions of security guards in the operating system and anti-virus software which help alleviate the problem, but they can be outsmarted and fairly easily side-stepped through user error. Under these circumstances, I would afford similar leniency to a surgeon and software in the event of an error. It is also possible for a patient to inaccurately report symptoms, which could result in a misdiagnosis. Once again, the problem can be mitigated in some cases by using measuring equipment like a stethoscope to be sure the patient really does have a problem breathing, but this is not always possible. Without a reliable means to determine operating conditions, software and doctors can both find themselves with no other option but to trust the patient. If the patient's story is inaccurate, the result can be undefined (possibly terrible) behavior. I couldn't hold a surgeon responsible for performing the wrong surgery if a malicious person was moving patients from room to room or exchanging their charts any more so than I can hold software responsible when its memory is being sabotaged. To state it more mathematically, if an algorithms preconditions are not met, and making that determination cannot be done reliably or is prohibitively expensive, the algorithms postconditions are not guaranteed.
David Ring did not operate on the wrong hand as the article states. This is a mistake that has been made in several articles and sometimes corrected (msnbc has corrected their article for example). The wrong operation was performed on the wrong "site". Specifically, the wrong place on the hand, but the operation was on the correct hand. It was the wrong operation, but it was on the right (actually left) hand. A careful reading of the NEJM article will confirm this.
First, how about the annoyance of re=prompting a users multiple times if they want to do or confirm an action? Once is enough to back out, or provide an undo, but nothing is more annoying than multiple popup boxes asking to confirm every selection. Second, pop-up boxes in general. If you need to relay some critical data, sure pop one up, but don't pop up a box for every little notification. Use a status bar, color change, or some other non-obtrusive method for relaying info. Pop-up boxes need to die, or be programmed to disappear after 5 seconds or so.
Yeah.. Its a better idea to approach our users without expect anything, any soft corner from them . Which helps to increase the quality of your product as well as our personal growth in profession. Once again really inspiring article from David.. Nice:)
> Remember how you’d work in Word or Excel for two hours, then up would pop the dreaded Unrecoverable Application Error box and it was all gone? Not acceptable. Ever. No matter what. "No matter what"? In the bad old days, CPU and disk (remember floppy?) speeds were slower than a snail. Going off to save the document in a single threaded environment every two minutes while you are trying to type furiously would be a pain in several parts of your body. Software quality (and technology in general), has been, is now, and will forever be determined by balancing various economic, usability, safety, reliability, risk tolerance, and other interests. You can have a $100K computer/Word package that is 99.99999999% reliable or a $1K package that is 99.99% reliable. I can tolerate losing one document every few months rather than pay $100K for a computer/Word package. But I would pay a much higher price for avionics software (through ticket prices), because I value my life more than I value a document. What's acceptable or not (software quality, workplace safety, ...) evolves over time. As economics and technology improves, our standards and expectations are raised. It was once acceptable to work in a dirty mine, and get lung cancer (or you don't eat). That's longer acceptable today in rich industrialized nations. Software is an art and you can be a critic and criticize subjectively. However, there is no objective measure to declare that "this software sucks". The market will decide what sucks (unless it is an ethical issue where bad software kills people; in this case "Not acceptable. Ever. No matter what." applies). Ken.
"I hear lazy geeks objecting" - LOL Another fun article! Great!1! (!1! intentional misspelled to convey enthusiasm)
More MSDN Magazine Blog entries >
Browse All MSDN Magazines
Subscribe to MSDN Flash newsletter
Receive the MSDN Flash e-mail newsletter every other week, with news and information personalized to your interests and areas of focus.