Article
06/17/2015

February 2015

Volume 30 Number 2

Visual Studio 2015 - Build Better Software with Smart Unit Tests

By Pratap Lakshman | February 2015

Editor’s note: "Smart Unit Tests" has been renamed “IntelliTest” with the release of the Visual Studio 2015 Release Candidate.

The world of software development is hurtling toward ever-shortening release cycles. The time when software development teams could strictly sequence the functions of specification, implementation and testing in a waterfall model is long past. Developing high-quality software is hard in such a hectic world, and calls for a reevaluation of existing development methodologies.

To reduce the number of bugs in a software product, all team members must agree on what the software system is supposed to do, and that’s a key challenge. Specification, implementation and testing have typically happened in silos, with no common medium of communication. Different languages or artifacts used for each made it difficult for them to co-evolve as the software implementation activity progresses, and so, while a specification document ought to connect the work of all the team members, that’s rarely the case in reality. The original specification and the actual implementation might diverge, and the only thing holding everything together eventually is the code, which ends up embodying the ultimate specification and the various design decisions made en route. Testing attempts to reconcile this divergence by resorting to testing just a few well-understood end-to-end scenarios.

This situation can be improved. A common medium to specify the intended behavior of the software system is needed, one that can be shared across design, implementation and testing, and that’s easy to evolve. The specification must be directly related to the code, and the medium be codified as an exhaustive suite of tests. Tool-based techniques enabled by Smart Unit Tests can help fulfill this need.

Smart Unit Tests

Smart Unit Tests, a feature of Visual Studio 2015 Preview (see Figure 1), is an intelligent assistant for software development, helping dev teams find bugs early and reduce test maintenance costs. It’s based on previous Microsoft Research work called “Pex.” Its engine uses white-box code analyses and constraint solving to synthesize precise test input values to exercise all code paths in the code under test, persist these as a compact suite of traditional unit tests with high coverage, and automatically evolve the test suite as the code evolves.

Figure 1 Smart Unit Tests Is Fully Integrated into Visual Studio 2015 Preview

Moreover, and this is strongly encouraged, correctness properties specified as assertions in code can be used to further guide test case generation.

By default, if you do nothing more than just run Smart Unit Tests on a piece of code, the generated test cases capture the observed behavior of the code under test for each of the synthesized input values. At this stage, except for test cases causing runtime errors, the remaining are deemed to be passing tests—after all, that’s the observed behavior.

Additionally, if you write assertions specifying the correctness properties of the code under test, then Smart Unit Tests will come up with test input values that can cause the assertions to fail, as well, each such input value uncovering a bug in the code, and thereby a failing test case. Smart Unit Tests can’t come up with such correctness properties by itself; you’d write them based on your domain knowledge.

Test Case Generation

In general, program analysis techniques fall between the following two extremes:

Static analysis techniques verify that a property holds true on all execution paths. Because the goal is program verification, these techniques are usually overly conservative and flag possible violations as errors, leading to false positives.
Dynamic analysis techniques verify that a property holds true on some execution paths. Testing takes a dynamic analysis approach that aims at detecting bugs, but it usually can’t prove the absence of errors. Thus, these techniques often fail to detect all errors.

It might not be possible to detect bugs precisely when applying only static analysis or employing a testing technique that’s unaware of the structure of the code. For example, consider the following code:

int Complicated(int x, int y)
{
  if (x == Obfuscate(y))
    throw new RareException();
  return 0;
}
int Obfuscate(int y)
{
  return (100 + y) * 567 % 2347;
}

Static analysis techniques tend to be conservative, so the non-linear integer arithmetic present in Obfuscate causes most static analysis techniques to issue a warning about a potential error in Complicated. Also, random testing techniques have very little chance of finding a pair of x and y values that trigger the exception.

Smart Unit Tests implements an analysis technique that falls between these two extremes. Similar to static analysis techniques, it proves that a property holds for most feasible paths. Similar to dynamic analysis techniques, it reports only real errors and no false positives.

Test case generation involves the following:

Dynamically discovering all the branches (explicit and implicit) in the code under test.
Synthesizing precise test input values that exercise those branches.
Recording the output from the code under test for the said inputs.
Persisting these as a compact test suite with high coverage.

Figure 2 shows how it works using runtime instrumentation and monitoring and here are the steps involved:

The code under test is first instrumented and callbacks are planted that will allow the testing engine to monitor execution. The code is then run with the simplest relevant concrete input value (based on the type of the parameter). This represents the initial test case.
The testing engine monitors execution, computes coverage for each test case, and tracks how the input value flows through the code. If all paths are covered, the process stops; all exceptional behaviors are considered as branches, just like explicit branches in the code. If all paths haven’t been covered yet, the testing engine picks a test case that reaches a program point from which an uncovered branch leaves, and determines how the branching condition depends on the input value.
The engine constructs a constraint system representing the condition under which control reaches to that program point and would then continue along the previously uncovered branch. It then queries a constraint solver to synthesize a new concrete input value based on this constraint.
If the constraint solver can determine a concrete input value for the constraint, the code under test is run with the new concrete input value.
If coverage increases, a test case is emitted.

Figure 2 How Test Case Generation Works Under the Hood

Steps 2 through 5 are repeated until all branches are covered, or until preconfigured exploration bounds are exceeded.

This process is termed an “exploration.” Within an exploration, the code under test can be “run” several times. Some of those runs increase coverage, and only the runs that increase coverage emit test cases. Thus, all tests that are generated exercise feasible paths.

Bounded Exploration

If the code under test doesn’t contain loops or unbounded recursion, exploration typically stops quickly because there are only a (small) finite number of execution paths to analyze. However, most interesting programs do contain loops or unbounded recursion. In such cases, the number of execution paths is (practically) infinite, and it’s generally undecidable whether a statement is reachable. In other words, an exploration would take forever to analyze all execution paths of the program. Because test generation involves actually running the code under test, how do you protect from such runaway exploration? That’s where bounded exploration plays a key role. It ensures explorations stop after a reasonable amount of time. There are several tiered, configurable exploration bounds that are used:

Constraint solver bounds limit the amount of time and memory the solver can use in searching for the next concrete input value.
Exploration path bounds limit the complexity of the execution path being analyzed in terms of the number of branches taken, the number of conditions over the inputs that need to be checked, and the depth of the execution path in terms of stack frames.
Exploration bounds limit the number of “runs” that don’t yield a test case, the total number of runs permitted and an overall time limit after which exploration stops.

An important aspect to any tool-based testing approach being effective is rapid feedback, and all of these bounds have been preconfigured to enable rapid interactive use.

Furthermore, the testing engine uses heuristics to achieve high code coverage quickly by postponing solving hard constraint systems. You can let the engine quickly generate some tests for code on which you’re working. However, to tackle the remaining hard test input generation problems, you can dial up the thresholds to let the testing engine crunch further on the complicated constraint systems.

Parameterized Unit Testing

All program analysis techniques try to validate or disprove certain specified properties of a given program. There are different techniques for specifying program properties:

API Contracts specify the behavior of individual API actions from the implementation’s perspective. Their goal is to guarantee robustness, in the sense that operations don’t crash and data invariants are preserved. A common problem of API contracts is their narrow view on individual API actions, which makes it difficult to describe system-wide protocols.
Unit Tests embody exemplary usage scenarios from the perspective of a client of the API. Their goal is to guarantee functional correctness, in the sense that the interplay of several operations behaves as intended. A common problem of unit tests is that they’re detached from the details of the API’s implementation.

Smart Unit Tests enables parameterized unit testing, which unites both techniques. Supported by a test-input generation engine, this methodology combines the client and the implementation perspectives. The functional correctness properties (parameterized unit tests) are checked on most cases of the implementation (test input generation).

A parameterized unit test (PUT) is the straightforward generalization of a unit test through the use of parameters. A PUT makes statements about the code’s behavior for an entire set of possible input values, instead of just a single exemplary input value. It expresses assumptions on test inputs, performs a sequence of actions, and asserts properties that should hold in the final state; that is, it serves as the specification. Such a specification doesn’t require or introduce any new language or artifact. It’s written at the level of the actual APIs implemented by the software product, and in the programming language of the software product. Designers can use them to express intended behavior of the software APIs, developers can use them to drive automated developer testing, and testers can leverage them for in-depth automatic test generation. For example, the following PUT asserts that after adding an element to a non-null list, the element is indeed contained in the list:

void TestAdd(ArrayList list, object element)
{
  PexAssume.IsNotNull(list);
  list.Add(element);
  PexAssert.IsTrue(list.Contains(element));
}

PUTs separate the following two concerns:

The specification of the correctness properties of the code under test for all possible test arguments.
The actual “closed” test cases with the concrete arguments.

The engine emits stubs for the first concern, and you’re encouraged to flesh them out based on your domain knowledge. Subsequent invocations of Smart Unit Tests will automatically generate and update individual closed test cases.

Application

Software development teams may be entrenched in various methodologies already, and it’s unrealistic to expect them to embrace a new one overnight. Indeed, Smart Unit Tests is not meant as a replacement for any testing practice teams might be following; rather, it’s meant to augment any existing practices. Adoption is likely to begin with a gradual embrace, with teams leveraging the default automatic test generation and maintenance capabilities first, and then moving on to write the specifications in code.

Testing Observed Behavior Imagine having to make changes to a body of code with no test coverage. You might want to pin down its behavior in terms of a unit test suite before starting, but that’s easier said than done:

The code (product code) might not lend itself to being unit testable. It might have tight dependencies with the external environment that will need to be isolated, and if you can’t spot them, you might not even know where to start.
The quality of the tests might also be an issue, and there are many measures of quality. There is the measure of coverage—how many branches, or code paths, or other program artifacts, in the product code do the tests touch? There’s the measure of assertions that express if the code is doing the right thing. Neither of these measures by themselves is sufficient, however. Instead, what would be nice is a high density of assertions being validated with high code coverage. But it’s not easy to do this kind of quality analyses in your head as you write the tests and, as a consequence, you might end up with tests that exercise the same code paths repeatedly; perhaps just testing the “happy path,” and you’ll never know if the product code can even cope with all those edge cases.
And, frustratingly, you might not even know what assertions to put in. Imagine being called upon to make changes to an unfamiliar code base!

The automatic test generation capability of Smart Unit Tests is especially useful in this situation. You can baseline the current observed behavior of your code as a suite of tests for use as a regression suite.

Specification-Based Testing Software teams can use PUTs as the specification to drive exhaustive test case generation to uncover violations of test assertions. Being freed of much of the manual work necessary to write test cases that achieve high code coverage, the teams can concentrate on tasks that Smart Unit Tests can’t automate, such as writing more interesting scenarios as PUTs, and developing integration tests that go beyond the scope of PUTs.

Automatic Bug Finding Assertions expressing correctness properties can be stated in multiple ways: as assert statements, as code contracts and more. The nice thing is that these are all compiled down to branches—an if statement with a then branch and an else branch representing the outcome of the predicate being asserted. Because Smart Unit Tests computes inputs that exercise all branches, it becomes an effective bug-finding tool, as well—any input it comes up with that can trigger the else branch represents a bug in the code under test. Thus, all bugs that are reported are actual bugs.

Reduced Test Case Maintenance In the presence of PUTs, a significantly lower number of tests cases need to be maintained. In a world where individual closed test cases were written manually, what would happen when the code under test evolved? You’d have to adapt the code of all tests individually, which could represent a significant cost. But by writing PUTs instead, only the PUTs need to be maintained. Then, Smart Unit Tests can automatically regenerate the individual test cases.

Challenges

Tool Limitations The technique of using white-box code analyses with constraint solving works very well on unit-level code that’s well isolated. However, the testing engine does have some limitations:

Language: In principle, the testing engine can analyze arbitrary .NET programs, written in any .NET language. However, the test code is generated only in C#.
Non-determinism: The testing engine assumes the code under test is deterministic. If not, it will prune non-deterministic execution paths, or it might go in cycles until it hits exploration bounds.
Concurrency: The testing engine does not handle multithreaded programs.
Native code or .NET code that’s not instrumented: The testing engine does not understand native code, that is, x86 instructions called through the Platform Invoke (P/Invoke) feature of the Microsoft .NET Framework. The testing engine doesn’t know how to translate such calls into constraints that can be solved by a constraint solver. And even for .NET code, the engine can only analyze code it instruments.
Floating point arithmetic: The testing engine uses an automatic constraint solver to determine which values are relevant for the test case and the code under test. However, the abilities of the constraint solver are limited. In particular, it can’t reason precisely about floating point arithmetic.

In these cases the testing engine alerts the developer by emitting a warning, and the engine’s behavior in the presence of such limitations can be controlled using custom attributes.

Writing Good Parameterized Unit Tests Writing good PUTs can be challenging. There are two core questions to answer:

Coverage: What are good scenarios (sequences of method calls) to exercise the code under test?
Verification: What are good assertions that can be stated easily without reimplementing the algorithm?

A PUT is useful only if it provides answers for both questions.

Without sufficient coverage; that is, if the scenario is too narrow to reach all the code under test, the extent of the PUT is limited.
Without sufficient verification of computed results; that is, if the PUT doesn’t contain enough assertions, it can’t check that the code is doing the right thing. All the PUT does then is check that the code under test doesn’t crash or have runtime errors.

In traditional unit testing, the set of questions includes one more: What are relevant test inputs? With PUTs, this question is taken care of by the tooling. However, the problem of finding good assertions is easier in traditional unit testing: The assertions tend to be simpler, because they’re written for particular test inputs.

Wrapping Up

The Smart Unit Tests feature in Visual Studio 2015 Preview lets you specify the intended behavior of the software in terms of its source code, and it uses automated white-box code analysis in conjunction with a constraint solver to generate and maintain a compact suite of relevant tests with high coverage for your .NET code. The benefits span functions—designers can use them to specify the intended behavior of software APIs; developers can use them to drive automated developer testing; and testers can leverage them for in-depth automatic test generation.

The ever-shortening release cycles in software development is driving much of the activities related to planning, specification, implementation and testing to continually happen. This hectic world is challenging us to reevaluate existing practices around those activities. Short, fast, iterative release cycles require taking the collaboration among these functions to a new level. Features such as Smart Unit Tests can help software development teams more easily reach such levels.

Pratap Lakshman works in the Developer Division at Microsoft where he is currently a senior program manager on the Visual Studio team, working on testing tools.

Thanks to the following Microsoft technical expert for reviewing this article: Nikolai Tillmann