Test Run

Software Testing Paradoxes

James McCaffrey

Code download available at:TestRun0512.exe(120 KB)

Paradoxes are fun. In this month's column I show you three interesting cases that can occur when you are performing software testing. They're fundamentally mathematical in nature, and they can be a useful addition to your troubleshooting arsenal.

In the first part of this column, I explain Simpson's Paradox. This is the situation when Software System A is worse in every area of comparison than Software System B, yet System A can still be shown to be the better system. In the second part, I demonstrate Braess's Paradox, which is when adding a perfectly good load-balancing server actually reduces network performance in a most surprising way. I finish up with a look at Parrondo's Paradox, where two completely independent but faulty systems can, incredibly, produce a correct system.

Simpson's Paradox occurs quite frequently. Although the chances of you actually encountering Braess's or Parrondo's Paradoxes are slim, I think you'll find reading about them quite entertaining.

Simpson's Paradox

Imagine you have two different software system prototypes: Prototype A and Prototype B. You want to evaluate which prototype is better based on user feedback. Suppose that for ease-of-use, Prototype A received an excellent evaluation by 50 out of 200 people, while Prototype B received excellent from 15 out of 100 people (see **Figure 1**). From this data, Prototype A is clearly superior to Prototype B in terms of ease-of-use, with Prototype A getting 25 percent excellent and Prototype B getting just 15 percent.

**Figure 1 Comparing Prototype A and Prototype B**

Ease of Use Data | Prototype A | Prototype B |
---|

Number excellent scores | 50 | 15 |

Total users | 200 | 100 |

Score | 25% | 15% |

Security Data | Prototype A | Prototype B |
---|

Number excellent scores | 85 | 300 |

Total users | 100 | 400 |

Score | 85% | 75% |

Now, suppose that for security features, Prototype A receives a rating of excellent from 85 out of 100 people, while Prototype B receives excellent from 300 out of 400 people. From this data, you can see that Prototype A (with 85 percent excellent) is again clearly superior to Prototype B (with just 75 percent excellent). So, which prototype do you conclude is better? Before you answer, let's combine the data from **Figure 1** into a single table as shown in **Figure 2**.

**Figure 2 Combined Data for Security and Ease-Of-Use**

Combined Data | Prototype A | Prototype B |
---|

Number excellent scores | 135 | 315 |

Total users | 300 | 500 |

Score | 45% | 63% |

If you combine the number of excellent evaluations, you see that Prototype A receives 135 out of 300, which is just 45 percent excellent. Meanwhile, Prototype B dominates with 63 percent excellent, based on its 315 out of 500. So Prototype B is superior, right? Or is Prototype A superior? And if it is, how can you explain away the combined data?

This is an example of Simpson's Paradox and it is a real phenomenon. There is no trick involved. The correct answer is that Prototype A is the better system; you should not combine the data together into a single table, as I've done in **Figure 2**. A brief, informal description of Simpson's Paradox is: a situation in which two or more sets of data lead to one conclusion when evaluated individually, but lead to an opposite conclusion when the sets are combined. If you look back at the data in **Figure 1**, you'll notice that the total number of evaluations for each prototype is different in each column. This is a necessary condition for Simpson's Paradox.

The preceding example is a bit artificial, but it illustrates Simpson's Paradox quite clearly. The important point is that Simpson's Paradox does actually happen, even with real-world data, and it is an occurrence you should be on the alert for. The paradox can easily sneak by you, especially if you are only presented with summary data and don't get a chance to see the original, uncombined data. Here is an example where the numbers are made up, but the scenario is based on an actual incident. The actual example appeared in "Sex Bias in Graduate Admissions: Data from Berkeley" by P.J. Bickel, E.A. Hammel, and J.W. O'Connell (1975).

Consider the graduate school admission data for a university, shown in **Figure 3**. The data shows that 4000 of 9000 males (44.4 percent) who applied to the university were admitted for graduate study, but only 1500 of 4500 females (33.3 percent) were admitted. Is this a case of some sort of gender inequity? Not necessarily. The data in **Figure 3** represents combined admissions for four departments in the university. Take a look at **Figure 4**, which represents the data for each individual department. With these figures, you find that women were admitted at a higher rate than men in every case! Clearly the uncombined data in **Figure 4** presents a better picture of admissions than the combined data in **Figure 3**. Even though this data is significantly simplified from the real incident, you can see that Simpson's Paradox can be quite difficult to spot.

**Figure 4 Uncombined Graduate School Admissions Data**

| Men | Women |
---|

A | 2000 | 2400 | 45% | 400 | 450 | 47% |

B | 1200 | 1000 | 55% | 100 | 80 | 56% |

C | 700 | 900 | 44% | 600 | 730 | 45% |

D | 100 | 700 | 13% | 400 | 1740 | 19% |

Total | 4000 | 5000 | | 1500 | 3000 | |

**Figure 3 Combined Graduate School Admissions Data**

| Admitted | Rejected | Acceptance Rate |
---|

Men | 4000 | 5000 | 44.4% |

Women | 1500 | 3000 | 33.3% |

Simpson's Paradox is not a true paradox in the mathematical sense. A mathematical paradox produces a logically inconsistent result, whereas examples of Simpson's Paradox are unexpected and surprising, but fully explainable. There is a lot of information available on the Web about Simpson's Paradox if you want to investigate further. The moral is that when you are looking at test result data, you should first question whether the data is aggregated from other sources. If it is, you need to look at the original uncombined data. Additionally, if you are generating a test result data report, be very careful about combining the data in an attempt to simplify your presentation. When people say that "statistics can be made to say whatever you want," this is exactly the sort of situation they're referring to.

Braess's Paradox

Now let's take a look at Braess's Paradox. My example uses a traditional scenario—automobile traffic—but I also describe how the paradox maps more broadly to software testing and other situations. The original work for Braess's Paradox is "Über ein Paradox der Verkerhsplannung," by Dietrich Braess (1968). My example, however, is based upon several examples I found that were, in turn, based on examples attributed to Mark Wainwright. I was unable personally to locate Wainwright's original work.

Braess's Paradox is quite complex so I present a simplified, discrete approximation rather than the full problem. Suppose there are four towns—Town A, Town B, Town C, and Town D—as shown in **Figure 5**. Each road connecting any two of the towns has an associated cost given by the equation shown next to the road in the figure. The cost is a function of the number of cars (n) on the road. You can imagine that the cost represents the time it takes to travel the road, or the amount of gasoline required, or some other factor that you'd like to minimize. Now suppose that each morning there are six cars that leave Town A, one at a time, and all are headed to Town D. Car 1 leaves and has a completely open road. The car has two routes to choose from: A-B-D or A-C-D. The cost of A-B-D is [4(1) + 1] + [1 + 16] = 22. Because of the symmetry in the map, the cost of route A-C-D is also 22. Let's assume that Car 1 selects route A-B-D and starts on its way.

Figure 5** Road Network: Braess's Paradox **

Now Car 2 prepares to leave. He sees Car 1 on route A-B-D and knows that route A-B-D will now cost [4(2) + 1] + [2 + 16] = 27, so he selects route A-C-D with a cost of 22. Car 3 sees that there is one car on each route and selects route A-B-D with a cost of 27. Car 4 then selects route A-C-D, which has a cost of 27. Car 5 sees the four cars evenly distributed and he decides to take route A-B-D bringing its cost to [4(3) + 1] + [3 + 16] = 32. Finally, Car 6 selects route A-C-D, which also costs 32. All six cars are now en route from Town A to Town D. Because there are three cars on each path and the two paths are symmetric, the cost for each car will be 32.

Here's where Braess's Paradox comes into play. What do you suppose will be the effect of adding a new, efficient shortcut road between Town B and Town C? Common sense says that adding more road capacity should decrease the cost to our drivers. But since this phenomenon is called "Braess's Paradox" and not "Braess's Common Sense," you can guess that's not what happens.

Suppose that the map in **Figure 5** is modified with a highly efficient shortcut between Town B and Town C, with a cost function equal to a constant 1. On the first morning with the new shortcut road in place, Car 1 prepares to leave Town A. He has four reasonable routes, each with an associated cost: