October 2009

Volume 24 Number 10

Test Run - Partial Antirandom String Testing

By James McCaffrey | October 2009

Antirandom (AR) testing is a variation of pure random testing, which is the process of generating random input and sending that input to a system under test. In most situations, random test input does not have an associated explicit expected return value or system state. In such situations, the purpose of random testing is to try to generate a system failure of some sort, such as a hang or crash. Research studies have shown that pure random testing is relatively less effective at discovering bugs than other testing paradigms, such as equivalence partition testing and boundary value analysis testing. However, random testing is appealing because it is typically quick and easy to implement. The idea of AR testing appears to have been introduced by the field of hardware testing. Essentially, AR testing generates an input set of random values, where the values are as different as possible from each other. The assumption is that similar input to a system will expose similar types of bugs, and therefore an input set that contains values that are very different from each other has a higher likelihood of exposing errors in a software system.


Figure 1 Generating Partial AR Strings

Here's an example. Suppose that a hardware system has a parameter that accepts 3-bit values. The complete input domain, therefore, has eight values: {000, 001, 010, 011, 100, 101, 110, 111}. Further, suppose you want to generate a test set that contains four AR inputs because exhaustive testing of all eight values is not feasible for some reason. Your initial AR test set is empty, and value {0,0,0} can be arbitrarily selected from the input domain and added to the test set. The next input to be added to the AR test set is the value from the domain space that is most different from the current values in the test set. In this case, value {1,1,1} is the most different from {0,0,0} using virtually any rational definition of different, so the AR test set now contains {0,0,0} and {1,1,1}. Next, assume that value {0,0,1} is selected from the input domain using the procedure described below and added to the AR test set so that your set now contains {0,0,0}, {1,1,1}, and {0,0,1}. Any of the remaining five values in the input domain are candidates for the fourth and final member of the test set. However, exactly which value is chosen depends upon the difference function used. One candidate function is the Hamming distance between two values, which is just the number of positions where the bits in the two values differ. The sum of Hamming distances between each candidate value and the three current members of the AR test set are:

010: 1 + 2 + 2 = 5
011: 2 + 1 + 1 = 4
100: 1 + 2 + 2 = 5
101: 2 + 1 + 1 = 4
110: 2 + 1 + 3 = 6

Because {1,1,0} is the most different from the existing members of the AR test set, it is added to the test set. This example points out that pure AR test input generation requires a complete enumeration of the entire input domain. This may be possible for certain types of hardware testing with a relatively small input bit size or for software systems with a small input domain. However, pure AR testing is rarely possible for complex software systems. This column presents an approach I call partial AR string testing, which can be used to test a wide range of software systems.

The best way to explain where I'm headed is with two screenshots. Figure 1 shows an example of how a partial AR string test set can be generated.

The example shown in Figure 1 is based on a seven-card poker system of some sort. Each string represents seven cards where 'A' represents an ace; 'T' represents a 10; and 'c', 'd', 'h' and 's' represent clubs, diamonds, hearts and spades. If duplicates are allowed, the total number of input strings is 52 raised to the seventh power = 1,028,071,702,528, which is likely to be unmanageable in most testing situations. AR test set generation begins by creating a surrogate input domain, which is a randomly selected subset of the entire input domain. Here I use an artificially small surrogate domain size of 10 strings to keep my example short. The AR  test set size is set to three, again artificially small for demonstration purposes. The AR test set is initially empty; the first string value from the surrogate domain, "Ts Th 3h Qd Kd 4d 9d", is added to the AR set. Next, the demo program examines each of the strings in the surrogate domain to find the string that is most different from the single string in the AR set. In this case, the string "3c 9h 9s 3s As 9c 2c" is selected as the most different. I will explain how the difference computation is performed shortly. Next, the demo program rescans the surrogate domain to find the string value that is most different from both of the two strings currently in the AR test set. After that final value, "9d Js Js Kd 7h 5d Jc", is determined and added to the AR set, the demo program displays the final set. The result is a set of test inputs that contains strings that are more different from each other than would be the case if the strings had been chosen randomly.


Figure 2 Testing with Partial AR Strings

Figure 3 Structure of Partial AR Test Set Generation

using System;
namespace Demo
{
class Program
{
static Random r = new Random(0);
static void Main(string[] args)
{
try
{
Console.WriteLine(
"\nBegin partial antirandom string generation demo”);
long domainSize = (long)Math.Pow(52, 7); // 1,028,071,702,528
int surrogateDomainSize = 8;
int antiRandomTestSetSize = 3;
// display sizes of input domain, surrogate domain,
antirandom test set
// initialize and display surrogate domain
// create partial antirandom test set
// display partial antirandom test set
}
catch (Exception ex)
{
Console.WriteLine("Fatal: " + ex.Message);
Console.ReadLine();
}
} // Main()
} // class
} // ns

The screenshot in Figure 2 shows one way in which partial AR strings can be used to test a system. I am testing a hypothetical PokerLib.dll module that accepts a string representing a seven-card poker hand as described earlier. The idea is to repeatedly generate partial AR test sets and send all the string values in each test set to the system under test. In this situation, I impose an arbitrary maximum number of cycles of 1,000,000. The demo shown in Figure 2 uses a surrogate input size of 100 and a partial AR test set size of 10. I coded the hypothetical PokerLib.dll system under test to quickly produce an Exception to demonstrate what you hope happens when performing partial AR testing, which is to produce an error.

In this column, I use the terms AR testing, partial AR testing, and partial AR string testing more or less interchangeably. However, in academic research literature, AR testing usually refers to the process I described where the input values are binary and the entire input domain is scanned to determine the values in the AR test set. I use the term "partial" to distinguish the technique described here, which scans a surrogate domain, from "pure" AR testing.

In the sections that follow, I will present and explain the code that generated the screenshots shown in Figures 1 and 2. In particular, I will describe different techniques that you can use to determine the difference between two strings and when each technique is appropriate. I will also present some guidelines on how to select the surrogate domain size and the AR test set size. I assume you have beginning- to intermediate-level programming skills. My examples are coded with C#, but you should be able to easily recast my code into other languages, including VB.NET, Java, Perl and PowerShell. I'm confident that you will find partial AR testing an interesting topic and that the ability to test systems using partial AR strings can be a valuable addition to your testing tool kit.

Generating Partial AR Strings

The code in Figure 3 is the overall structure of the program that generated the screenshot shown in Figure 1.

I create a C# Console Application project named Demo using Visual Studio 2008. My partial AR generation does not use any specialized .NET Framework namespaces, so I deleted the autogenerated using statements except for the one that references the root System namespace. I instantiate a Random object using a seed value of 0:

static Random r = new Random(0);

I supply a seed value to the Random constructor rather than use the default constructor without a seed, so that my output will be reproducible. For clarity and simplicity, throughout this column I use the term "random" rather than the more accurate "pseudorandom." After displaying a start message, I use the Math.Pow() method to compute the total number of strings in the input domain for the test scenario. Technically, I don't need this information, but it serves to point out that partial AR string testing is typically useful in situations where there are too many inputs to test exhaustively. In this situation, an input string represents a hand of seven cards and has the form "Rs Rs Rs Rs Rs Rs Rs", where R is one of 13 rank values ("A", "T", "2 thru 9", "J", "Q", "K") and "s" is one of four suit values ("c", "d", "h", "s"). Each card can be one of 13 * 4 = 52 possibilities, and so with duplicates allowed, the total number of possible strings is 52 raised to the seventh power.

Figure 4 Generating a Random Card String

public static string MakeARandomCard()
{
string answer = "";
int rank = r.Next(1, 14); // 1-13
int suit = r.Next(1, 5); // 1-4
if (rank == 1) answer += "A";
else if (rank == 10) answer += "T";
else if (rank == 11) answer += "J";
else if (rank == 12) answer += "Q";
else if (rank == 13) answer += "K";
else answer += rank.ToString(); // 2 thru 9
if (suit == 1) answer += "c";
else if (suit == 2) answer += "d";
else if (suit == 3) answer += "h";
else answer += "s";
return answer;
}

Next, I set the sizes of the surrogate domain and the test set to eight and three, respectively. These sizes are much smaller than ones you would normally use in a production environment. I will discuss the issues related to choosing these sizes shortly. I initialize the surrogate domain by calling a helper method named MakeARandomDomainString():

Console.WriteLine("\nGenerating surrogate domain");
for (int i = 0; i < surrogateDomain.Length; ++i)
  surrogateDomain[i] = MakeARandomDomainString();

The code for the helper method is:

public static string MakeARandomDomainString()
{
  string answer = "";
  for (int c = 0; c < 6; ++c) // first 6 cards
    answer += MakeARandomCard() + " ";
  
  answer += MakeARandomCard(); // last card, no trailing space
  return answer;
}

When using partial AR testing, each testing scenario will have a different input string format. At one extreme, you may have no constraints on input string format other than length. In such situations, you will need to simply generate completely random strings. In my example, I have a well-structured format. In general, the usefulness of partial AR string testing increases as the structure of the input strings increases. The MakeARandomDomainString() method simply generates a string representing seven cards by iteratively calling a helper method MakeARandomCard(), as shown in Figure 4.

I use the Random object I instantiated earlier to produce two pseudorandom integers in the range [1,13] and [1,4], respectively. Note that the two parameters to the Random.Next() method represent the inclusive lower bound and the exclusive upper bound, so r.Next(1,14) generates a random integer between one and 13 inclusive rather than between one and 14. I use the overloaded += operator to build up my card string, rather than the more efficient StringBuilder class of the System.Text namespace. Throughout this article, I use generic programming techniques so that you can more easily refactor my example code into other programming languages. Additionally, I have coded my example using a traditional non-OOP style so that you can easily refactor to scripting languages with limited OOP support, such as JavaScript and Perl.

After creating the surrogate domain, I display it:

Console.WriteLine("\nThe surrogate domain is: \n");
for (int i = 0; i < surrogateDomain.Length; ++i)
  Console.WriteLine("[" + i + "] " + surrogateDomain[i]);

In this example, I do not check for duplicate values in the surrogate domain. In general, the effort required to prevent duplicates in the surrogate domain is too great to justify the benefit gained. Recall that the main appeal of random testing techniques is simplicity. Furthermore, as we'll see shortly, duplicate values in the surrogate domain are highly unlikely to be placed into the AR test set. Now I am ready to generate the partial AR test set. The code in Figure 5 shows how I do this. I begin by selecting the first value in the surrogate domain and placing it into the AR test set:

Console.WriteLine("Adding " + surrogateDomain[0] + " to AR test set");
antiRandomTestSet[0] = surrogateDomain[0];
int numberValuesInTestSet = 1;

Because the strings in the surrogate domain are generated randomly, in most situations there is no advantage in selecting a particular string. I use a variable numberValuesInTestSet to keep track of how many strings are currently in the AR test set.

Figure 5 Code to Generate a Partial AR Test Set

for (int k = 1; k < antiRandomTestSet.Length; ++k) {
int largestTotalDeltaForAllI = 0;
int indexOfBestCandidate = 0;
int totalDeltaForCurrentI = 0;
for (int i = 0; i < surrogateDomain.Length; ++i) {
totalDeltaForCurrentI = 0;
for (int j = 0; j < numberValuesInTestSet; ++j) {
totalDeltaForCurrentI +=
RotateDistance(surrogateDomain[i], antiRandomTestSet[j]);
} // j
if (totalDeltaForCurrentI > largestTotalDeltaForAllI) {
largestTotalDeltaForAllI = totalDeltaForCurrentI;
indexOfBestCandidate = i;
}
} // i
Console.WriteLine("\nDetermining antirandom value: [" + k + "]");
Console.WriteLine(
"String at [" + indexOfBestCandidate + "] in surrogate domain");
Console.WriteLine(
"is most different from strings curently in AR test set");
Console.WriteLine(“Adding " + surrogateDomain[indexOfBestCandidate] +
" to AR test set");
antiRandomTestSet[numberValuesInTestSet++] =
surrogateDomain[indexOfBestCandidate];
} // k

The outermost for loop with index variable k is used to point into and populate each cell in the AR test set. I start at value k = 1 because I seeded the first cell in the AR test set at index 0 with the first value in the surrogate domain. Because the goal is to find the string in the surrogate domain that is most different from all strings currently in the AR test set, I create variable largestTotalDeltaForAllI to track a value that represents the largest difference found. I also initialize a variable indexOfBestCandidate to record where in the surrogate domain the string with the largest difference value was found.

Figure 6 Levenshtein Distance

public static int LevenshteinDistance(string s, string t) // assume s, t
not null
{
int[,] dist = new int[s.Length + 1, t.Length + 1]; // distance
int subCost; // substitution cost
if (s.Length == 0) return t.Length;
if (t.Length == 0) return s.Length;
for (int i = 0; i <= s.Length; ++i)
dist[i,0] = i;
for (int j = 0; j <= t.Length; ++j)
dist[0,j] = j;
for (int i = 1; i <= s.Length; i++) {
for (int j = 1; j <= t.Length; j++) {
if (t[j-1] == s[i-1])
subCost = 0;
else
subCost = 1;
int temp = Math.Min(dist[i - 1, j] + 1, dist[i, j - 1] + 1);
dist[i, j] = Math.Min(temp, dist[i - 1, j - 1] + subCost);
}
}
return dist[s.Length, t.Length];
}

The next for loop with index variable i iterates through each candidate string in the surrogate domain. After initializing a variable totalDeltaForCurrentI to 0, I use the nested for loop with index variable j to fetch each of the strings currently in the AR test set. So, at this point, index k (and variable numberValuesInTestSet) point to the next empty cell in the AR test set, index i points to a candidate string in the surrogate domain, and index j points to a string currently in the AR test set. Now I determine the difference between the two strings pointed to by i and j and accumulate that difference to the running total for the current candidate string in the surrogate domain:

totalDeltaForCurrentI +=
RotateDistance(surrogateDomain[i], antiRandomTestSet[j]);

The call to the RotateDistance() helper method is important, and I will discuss it in just a moment. If the sum of distances between the candidate string and all strings in the AR test set is larger than my current largest sum-of-distances value, I update my tracking variables:

if (totalDeltaForCurrentI > largestTotalDeltaForAllI) {
largestTotalDeltaForAllI = totalDeltaForCurrentI;
indexOfBestCandidate = i;
}

Recall that the whole point of partial AR string testing is to generate strings that are as different as possible from each other. After printing some progress messages to the command shell, I add the best candidate found in the surrogate domain to the AR test set and update the number of values in the test set:

antiRandomTestSet[numberValuesInTestSet++] =
  surrogateDomain[indexOfBestCandidate];

Because both index k and variable numberValuesInTestSet hold the same value, I could have substituted k here. However, the use of the more descriptive variable is a bit more readable in my opinion.

Computing the Difference Between Two Strings

When performing partial AR string testing, you need a method that computes the difference between two strings. Exactly how you compare two strings will depend upon the format of the strings used in your testing scenario. If the strings that are being compared always have the same length, one simple possibility is to use the character-based Hamming distance, which is just the number of positions in either string where characters differ. For example, if s = "car" and t = "bag", the Hamming distance between s and t is two. If your two strings have different lengths, you can use a modified Hamming distance that adds the difference in size between the longer string and the shorter string. For example if s = "car" and t = "cable", the modified Hamming distance is one (for the 'r' vs. 'b') + 2 (for the 'l' and 'e') = 3. Here is one way to implement the Hamming distance for same-size strings:

public static int HammingDistance(string s, string t)
{
  if (s.Length != t.Length)
    throw new Exception("s and t must have same length in
      HammingDistance()");
  int ct = 0;
  for (int i = 0; i < s.Length; ++i)
    if (s[i] != t[i])
      ++ct;
  return ct;
}

Another simple measure of the difference between two strings is the character-based Cartesian distance. This is similar to the geometric distance between two points, which you may remember from high-school algebra. Suppose string s = "bat" and t = "car". The ASCII value of 'b' is 98 and the ASCII value of 'c' is 99, and so the difference between the two characters is one. The difference between 't' and 'r' is two. The character-based Cartesian distance between strings s and t is sqrt(1^2 + 0^2 + 2^2) = 2.24. A more sophisticated way to compute the difference between two strings is the Levenshtein distance. The Levenshtein distance between two strings is the smallest number of character insertions, deletions and substitutions required to transform one string to the other. The Levenshtein distance is a fascinating topic in its own right, but a thorough discussion of it is outside the scope of this article. Figure 6 lists a simple implementation of the Levenshtein distance that you can use for partial AR string testing.

The more structured your input strings are, the more likely it is that a custom difference method will be more useful than generic differences methods, such as the Hamming distance or the Levenshtein distance. For example, in my poker-card scenario, consider these two strings:

s = "Ac Kd Qh Js Tc 9d 8h"
t = "Kd Qh Js Tc 9d 8h 7c"

Notice that the two strings represent poker hands that are very similar, with six out of seven cards being identical. However, the Hamming distance between s and t is 14, and as it turns out, the Levenshtein distance is six. In my scenario, I implemented a custom difference method named RotateDistance(). The RotateDistance() method takes advantage of the structure of the poker hand strings and compares based on card values. The code for RotateDistance() is listed in Figure 7. My custom difference method starts by computing the Hamming distance between the method input arguments:

s = "Ac Kd Qh Js Tc 9d 8h"
t = "Kd Qh Js Tc 9d 8h 7c"
distance = 14

Next, RotateDistance(s,t) rotates string s to the left by one card position and computes the Hamming distance again:

s = "Kd Qh Js Tc 9d 8h Ac"
t = "Kd Qh Js Tc 9d 8h 7c"
distance = 1

The method performs this rotate-and-compare process through all seven card positions and returns the smallest distance found, which in this example is one. The code for the RotateLeft() helper method uses String.Substring() to break the input argument into two parts and then string concatenation to recombine the two parts:

public static string RotateLeft(string s)
{
  string firstPart = s.Substring(0, 3);
  string remainder = s.Substring(3, s.Length - 3);
  string answer = remainder + " " + firstPart.Trim();
  return answer;
}

Based on my experience, when test input strings have a fair amount of structure, some variation of the rotate-and-compare algorithm I've described is often highly effective. Let me point out that even if your testing scenario has highly structured input strings, you are under no obligation to write a custom difference method as I've done here. Using a custom difference method will help to generate AR strings that are more different from each other than those produced by using the Hamming distance or Levenshtein distance, but you must gauge the benefit against the effort required to create a custom difference method.

Figure 7 Custom RotateDistance() Method

public static int RotateDistance(string s, string t)
{
int d = HammingDistance(s, t);
for (int compare = 1; compare <= s.Length / 3; ++compare)
{
s = RotateLeft(s);
int temp = HammingDistance(s, t);
if (temp < d)
d = temp;
}
return d;
}

Testing with Partial AR Strings

Partial AR string testing is really a meta-technique, meaning the method is more of a general set of guidelines rather than a specific cookbook-style approach that can be applied in all testing situations. One of many possible ways to perform partial AR string testing is illustrated in the screenshot shown in Figure 2. The listing in Figure 8 is the key code that generated the screenshot in Figure 2.

First, I display some logging messages to the command shell, including the name of the system under test, the surrogate domain size and the AR test set size. Using the approach I described in the previous section, if the surrogate domain has size m, and the AR test size has size n, generating the AR test set requires (n * (n-1) * m) / 2 iterations. Furthermore, a string comparison is performed in each iteration, and each comparison requires at least min(length(s),length(t)) character comparisons. The point is generating a partial AR test set where you specify a large surrogate domain size, and a large AR test set size can require a significant amount of processing time. The larger these two sizes are, the better your AR test set will be in terms of test string input diversity, but the time required to generate your AR test set reduces the number of inputs actually sent to the system under test.

Figure 8 Example of Testing with Partial AR Strings

int sdSize = 100;
int arSize = 10;
Console.WriteLine("\nBegin antirandom testing demo");
Console.WriteLine("System under test = PokerLib.dll module");
Console.WriteLine("Maximum number of cycles = 1,000,000");
Console.WriteLine("Surrogate domain size = " + sdSize);
Console.WriteLine("Antirandom test set size = " + arSize);
long maxNumberCycles = 100000;
int cycle = 0;
while (cycle < maxNumberCycles)
{
++cycle;
Console.WriteLine("=============================");
Console.WriteLine("Current antirandom cycle = " + cycle);
Console.WriteLine("Generating antirandom test set");
string[] arTestSet = MakeAntiRandomTestSet(sdSize, arSize);
Console.WriteLine("\nThe antirandom test set is: \n");
Console.WriteLine("[0] " + arTestSet[0]);
Console.WriteLine("[1] " + arTestSet[1]);
Console.WriteLine(" . . . . . . .");
Console.WriteLine("[9] " + arTestSet[9]);
Console.WriteLine(
"\nSending each antirandom input to system under test\n");
for (int i = 0; i < arTestSet.Length; ++i) {
PerformTest(arTestSet[i]);
}
Console.WriteLine("All antirandom input accepted by SUT\n");
}

Next, I specify the maximum number of cycles to generate AR test sets. In many testing scenarios, you want to run AR testing continuously; in such situations, you can just replace the controlling while loop with while(true). Inside the main AR control loop, I generate an AR test set, and then send each string from the test set as input to the system under test. I wrote two small helper methods. The first method, MakeAntiRandomTestSet(), simply wraps the code presented in the previous section:

public static string[] MakeAntiRandomTestSet(int surrogateDomainSize,
  int antirandomTestSetSize)
{
  string[] result = new string[antirandomTestSetSize];

  string[] sd = new string[surrogateDomainSize];
  // generate surrogate domain, sd
  // fill antirandom test set, result
  return result;
}

The second helper method, PerformTest(), accepts a string and simulates testing the hypothetical PokerLib.dll system under test:

public static void PerformTest(string s)
{
  if (s[1] == 'c' && s[4] == 'c') {
    throw new Exception("Exception thrown in system
      under test\n" + "for input = " + s);
  }
}

Here I create a situation where the system under test will fail if the first two cards of the test case input are clubs. Notice that I remember to display the test case input that generates the system failure so that the problem can be reproduced and ultimately fixed by the development team. It would be unfortunate if, after several days or even weeks or months of running extended AR testing, you induce a catastrophic failure in your system under test and find that you forgot to record the input that generated the error.

There are many other patterns you can use for partial AR testing. Instead of using a small surrogate domain and a small AR test set combined with a large number of cycles, you may want to generate one large AR test set from a large surrogate domain and perform fewer test cycles. Notice that the approach described in this section generates an AR test set and uses the strings in the set, but does not save any of the AR test sets. An alternative is to write the strings in each AR test set to external storage, such as a text file, an Excel spreadsheet, or a SQL table, as the strings are generated. Then, if you wish to perform regression testing, you will already have the test input ready and your test run will be much quicker than if you had to regenerate the input.

Wrapping Up

Let me summarize the ideas presented in this article. Partial AR string testing is a technique that is useful in situations where a system under test accepts string input and the size of the input domain is too large to test exhaustively. To perform partial AR string testing, you first create a large surrogate domain that is a random subset of the input domain. Next, you generate an AR test set that is a subset of the surrogate domain where the strings in the test set are maximally different from each other. Each string in the AR test set is then sent to the system under test. In most situations, you do not determine an explicit expected value or state for each AR string input. Instead, you are attempting to cause the system under test to fail in some way. The Hamming distance and the Levenshtein distance are two generic techniques to compute a difference value between two strings. If your test scenario has strings that have a high degree of structure, you can create your own custom string difference method. Such custom methods often involve string rotation.

Academic research in the field of software testing consistently produces results that suggest that there is no single best testing technique. When appropriate, adding partial AR string testing to your test effort can help you produce software that has fewer bugs, is more reliable and is more secure.


Dr. James McCaffreyworks for Volt Information Sciences Inc., where he manages technical training for software engineers based at Microsoft’s Redmond, Wash., campus. He has worked on several Microsoft products, including Internet Explorer and MSN Search, and is the author of “.NET Test Automation: A Problem-Solution Approach” (Apress, 2006). Dr. McCaffrey can be reached at jmccaffrey@volt.com or v-jammc@microsoft.com.