Editor's Note - Chain of Disaster

By Michael Desmond | October 2015

The discipline of computer security has more than a little in common with airline flight safety. Both are fraught with high stakes and increasing complexity over time. When something goes wrong, remediation is often built on painstaking forensics and paid for with funds that become available only after a high-profile failure. Another common thread: Catastrophic failure often springs from mundane causes.

Take the infamous data breach at Target in 2013. Hackers entered the network the old-fashioned way—they stole credentials from an HVAC contractor with login rights to the Target network, and from there gained access to payment systems. The installation of malware was actually detected and flagged by the FireEye security software Target had deployed months earlier, yet when security staff in Bangalore forwarded the alerts to Minneapolis, the security team there declined to take action. Over the months that followed, some 40 million debit and credit cards, along with gigabytes of customer data, were exfiltrated from the Target systems, leading to more than $100 million in losses for the retail giant.

A similar “chain of disaster” pattern is evident in many airliner accidents. Air France 447 in 2009 crashed into the Atlantic Ocean after encountering thunderstorms at night near the equator while enroute from Rio de Janeiro to Paris. It was later learned that ice had likely blocked the aircraft’s pitot tubes—small openings on the side of the hull for measuring airspeed and pressure. This produced incorrect and conflicting data that disabled the plane’s autopilot and apparently disoriented the first officer flying the craft through the storm. Possibly convinced that his plane was flying dangerously fast, he commanded a constant nose up attitude that, in fact, sharply reduced airspeed and produced a high-altitude stall. The aircraft ultimately fell belly first into the sea, killing all on board.

In both cases, systems that performed as designed in the face of negative events were misinterpreted by the people managing them. Both the security team at Target and the pilot on Air France 447 struggled to comprehend the data presented to them and took actions that made a bad situation worse. Part of the blame rests with the human operators, but part lies also with the systems themselves.

For the crew of Air France 447, confusing audible warnings played a role. By design, the stall warning of the Airbus 340 goes silent if measured airspeed falls below a threshold where the data is considered invalid. So when the first officer lowered the nose of the jet to gain much-needed airspeed, it actually caused the stall warning to reengage, while pulling back to raise the nose (and resume the deep stall) silenced the alarm. Faced with conflicting data inputs and confusing feedback from the stall horn, the pilots very likely didn’t know what instruments to trust.

If there is one overriding parallel between air accident investigations and software security incidents, it’s that each presents an invaluable opportunity to better understand the complex interaction of human factors, environmental stresses, and system behavior and automation.

Michael Desmond is the Editor-in-Chief of MSDN Magazine.

Editor's Note - Chain of Disaster

Additional resources