Analyzing Code Coverage in .NET to Ensure Thorough Testing

Article
10/11/2019

Testing

Perform Code Coverage Analysis with .NET to Ensure Thorough Application Testing

James McCaffrey

This article discusses:

The importance of code coverage analysis
A custom code coverage tool
The role of code coverage in the development process
Profiling in .NET

This article uses the following technologies:
C++, C#, Microsoft .NET Framework

Code download available at:CodeCoverageAnalysis.exe(174 KB)

Contents

Overview of Code Coverage Analysis
The Profiling DLL
Building ProfilerFF.dll
Preparing for Code Coverage
Running Tests Under Code Coverage
Reporting Code Coverage
Before and After .NET
Conclusion

The basic idea behind code coverage is straightforward. During product development, a large number of test cases are created and run to ferret out bugs in the system. Code coverage analysis monitors which parts of the product's code are exercised by the collection of test cases. For it to be effective, 100 percent of the product's code should be executed by the test cases. If there are segments of product code that are never run during testing, then the product has not been thoroughly tested.

Although the idea of code coverage is simple enough, actually performing code coverage analysis in a non-.NET environment can be very time consuming, difficult, and expensive. However, the Microsoft® .NET environment provides software developers and testers with a simple and effective new way to perform code coverage analysis. In the days before .NET, code coverage was one of the most frustrating parts of my job as a tester. Recently, though, I discovered that the .NET environment provides a much improved process for performing code coverage. In this article, I will provide you with all the code you need to perform .NET code coverage analysis. I call this system Fundamental Function code coverage. Adding code coverage analysis skills to your toolset will help you keep a watchful eye on your code, whether you are a developer, tester, or program manager.

In the sections that follow, I will walk through an example of code coverage at a high level so you can understand the principles involved. Then I will discuss the details of the implementation. I will conclude with a discussion of how code coverage analysis fits into the software product development cycle.

Overview of Code Coverage Analysis

Because traditional code coverage analysis is so tricky, I'll walk through a concrete example at a high level. Code coverage analysis has three phases: preparing the coverage environment, enabling coverage profiling and running tests, and then disabling coverage profiling and generating a report.

The product I will analyze is a Windows®-based application that references a dummy class library that contains several dummy constructors and methods. Figure 1 illustrates how my code coverage environment is prepared.

Figure 1 Methods Under Coverage

Figure 1** Methods Under Coverage **

First, you build a special profiling DLL which, when enabled later, will watch the execution of code through the common language runtime (CLR) and log all entries to specified methods in the product being tested. After creating the profiler, you identify which .NET assemblies you want to include under coverage. In this case, I'm looking at the DummyApp.exe and DummyLib.dll assemblies.

Next you determine which constructors, methods, and properties are in the assemblies under coverage. As you can see, there are 12 of these under coverage in Figure 1.

Figure 2 shows the second phase of code coverage: enabling the profiling DLL. Now any .NET managed code that runs in the environment will be monitored and recorded.

Figure 2 Enabling and Using a Profiler

Figure 2** Enabling and Using a Profiler **

Next, you run either a single test case or a suite of tests. In this example, I just clicked the Methods button of the dummy application to simulate testing it. When performing code coverage analysis, I'm usually not concerned with whether tests pass or fail, but rather how much of the codebase of the product under test is touched by the tests. In other words, even if every test fails, but code coverage is 100 percent, I know I'll need to fix my code but I'm happy that all the methods in the product are being tested. That said, you need to be careful about test failures and their affects on code coverage. If you don't do coverage using the tests as they're supposed to run, you can't be sure that you covered the right code. After running the test, close the dummy application to flush to a text file all coverage information that the profiler gathered.

The third phase of code coverage is illustrated in Figure 3. I disabled coverage profiling so that it wouldn't continue gathering information. Then I ran the report generator, which collects the information about all methods that could potentially be touched (which was generated in Phase 1) and gathers the information about the methods that were actually touched (generated in Phase 2). It then compares these two lists to determine the resulting code coverage percentage. In this example, 8 of the potential 12 methods were exercised by the test, for a 67 percent coverage rate. I can use the report data to determine which methods have not been exercised and create new test cases.

Figure 3 Report Generation

Figure 3** Report Generation **

In other systems the preparation of code coverage, during which time you determine which methods you will monitor, and the reporting part, where you analyze the results, are often performed by two different programs. In the Fundamental Function system presented here, I put both of these functionalities into a single program named FFcover.exe and controlled them by -m and -r command-line arguments.

The system described in this article monitors code coverage at the method level. With it I determine if a particular constructor, method, or property has been entered, but I do not learn about the path of execution inside the method. More granular code coverage systems operate at lower levels. For example, one system called Basic Block coverage operates at the block level (testing whether execution enters a particular block of statements).

The Profiling DLL

As mentioned in the previous section, at the foundation of code coverage is a profiling DLL that can monitor all the activity that passes through the CLR. As you might expect, this is very sophisticated code. Fortunately, the .NET Framework comes with two complete example profiler source code sets. I was able to use one of them with very few changes to create my profiler—ProfilerFF.dll.

In the example shown in Figure 2, the profiler works behind the scenes to capture method names that were entered when the dummy test application ran in the coverage-enabled environment. That data is logged to a text file named ffcoverXXX.log, where the Xs represent a time-stamp value. One line of the resulting log file looks like the following:

0x00000d38;void DummyLib.Dummy::Bar(int32&,float64&,String[])

Here the data following the thread ID indicates that the Bar method in the Dummy class, which is part of the DummyLib namespace and which accepts three parameters and returns void, was entered during the test.

Let's examine the profiler. The base profiler source code from which I derived my code coverage profiler is located by default in the C:\Program Files\Microsoft Visual Studio .NET\FrameworkSDK\Tool Developers Guide\Samples\profiler\hst_profiler folder. HST stands for hot spot tracker to indicate that a DLL built from that code can monitor how much time is spent in methods.

I first created a C:\Coverage root folder on my machine and then copied the hst_profiler and Include folders to it. The build process will require that these two folders have the same parent folder. Next, I renamed the hst_profiler folder to FFprofiler to reflect its new functionality.

Before editing, I renamed three files in the FFprofiler folder: ProfilerHST.cpp to ProfilerFF.cpp, ProfilerHST.def to ProfilerFF.def, and ProfilerHST.mak to ProfilerFF.mak. It's clear that these three files ending with HST are specific to the timing profiler, so renaming them makes sense. The files ProfilerInfo.cpp, ProfilerInfo.h, ProfilerCallback.cpp, and ProfilerCallback.h are, for the most part, generic to any profiling DLL, so it is reasonable to leave their names alone. I also renamed EnableProfiler.bat to enable.bat to make it easier to type on a shell command line.

The first file I edited is ProfilerCallback.h. At the beginning of the file, I made a few minor changes:

extern const GUID __declspec( selectany ) CLSID_PROFILER = { 0x5ac86959, 0x7927, 0x4177, { 0x98, 0x02, 0xa5, 0x40, 0xf1, 0x96, 0x78, 0x74 } }; #define THREADING_MODEL "Both" #define PROGID_PREFIX "ProfilerFF" #define COCLASS_DESCRIPTION "Fundamental Function Code Coverage" #define PROFILER_GUID "{5AC86959-7927-4177-9802-A540F1967874}"

Every profiler has a GUID identifier so that different profilers can be active in the same environment, monitoring different CLR activity. I used the Visual Studio® .NET Create GUID tool to generate a new GUID and pasted it at the two places shown. I also changed the values of PROGID_PREFIX and COCLASS_DESCRIPTION to reflect code coverage functionality.

The second change I made was in the ProfilerFF.def definition file. Again, this was just a minor change to make it conform to the other edits, so that the first line below is changed to the second:

LIBRARY ProfilerHST LIBRARY ProfilerFF

The third set of edits was in the ProfilerFF.mak makefile. I will cover those changes in the next section when I build the profiler DLL. The fourth and last file in folder ProfilerFF that I edited is ProfilerInfo.cpp. Most of the changes were in the FunctionTimingInfo::Dump function. The original Dump function outputs a semicolon-delimited text file containing the name of the function entered, the period of time the execution engine spent there, the number of times the function was entered, and other useful information. For most types of code coverage, you only need to know which methods were entered. In fact, additional information just makes parsing more difficult later. The portion of the code that records which methods were actually entered during a test in a coverage-enabled environment is shown in Figure 4.

Figure 4 Profiler Output Code

if ( pFunctionInfo->m_functionName[0] != NULL ) { if ( wcsstr(pFunctionInfo->m_functionName,L"DummyApp") != NULL || wcsstr(pFunctionInfo->m_functionName,L"DummyLib") != NULL ) { LOG_TO_FILE( ("0x%08x;", m_win32ThreadID) ) if ( pFunctionInfo->m_returnTypeStr[0] != NULL ) LOG_TO_FILE( ("%S ", pFunctionInfo->m_returnTypeStr) ) else LOG_TO_FILE( ("No return value") ) LOG_TO_FILE( ("%S(", pFunctionInfo->m_functionName) ) if ( pFunctionInfo->m_argCount > 0 ) { WCHAR *parameter; WCHAR *separator = L"+"; parameter = wcstok( pFunctionInfo->m_functionParameters, separator ); while ( parameter != NULL ) { LOG_TO_FILE( ("%S", parameter) ) parameter = wcstok( NULL, separator ); if ( parameter != NULL ) LOG_TO_FILE( (",") ) } // while } // if m_argCount > 0 LOG_TO_FILE( (")") ) LOG_TO_FILE( ("\n") ) } } else LOG_TO_FILE( ("Unable to Retrieve Function Information \n") )

The Dump code first checks to make sure that the pointer to the function name is not NULL, or in other words that there is a function name to log:

if ( pFunctionInfo->m_functionName[0] != NULL )

The next if conditional acts as a filter to specify which methods to log. The following code returns true only for functions (such as .NET methods, constructors, and properties) whose names begin with DummyApp or DummyLib:

wcsstr(pFunctionInfo->m_functionName,L"DummyApp") != NULL || wcsstr(pFunctionInfo->m_functionName,L"DummyLib") != NULL )

Because the full name of a .NET method begins with its namespace, this filter will capture all methods in the DummyApp.exe and DummyLib.dll assemblies. If you don't add such a filter, the profiler will capture all system-related methods too, and there are a lot of them. This can, of course, be changed to suit your needs.

All actual logging is done using a LOG_TO_FILE macro defined in the Include\basehdr.h file. The following code writes an ID associated with the thread of execution to a text file specified in the file Include\basehlp.hpp (which I will examine in a moment):

LOG_TO_FILE( ("0x%08x;", m_win32ThreadID) )

Having a thread ID is useful when the application under test is multithreaded. It also gives you an easy way to determine if a line of output is actual data or other information such as comments.

It would be wonderful if you could identify a method using only its namespace prefix and its name, but because of overloading you need return type and parameter information to differentiate methods with the same name. Here's the code that logs the method return type and method name:

if ( pFunctionInfo->m_returnTypeStr[0] != NULL ) LOG_TO_FILE( ("%S ", pFunctionInfo->m_returnTypeStr) ) LOG_TO_FILE( ("%S(", pFunctionInfo->m_functionName) )

Dealing with data type names is a major issue. The problem is that there are three sets of type names: those used by the profiler (for example, int32), those used by the CLR and the .NET Framework (System.Int32), and those used by C# and other .NET-compliant languages (int). For now it is enough for you to realize that the profiler has one set of type names that will later have to be mapped to .NET type names.

The code that logs the function parameters uses an old C/C++ parsing technique, as shown here:

parameter = wcstok( pFunctionInfo->m_functionParameters, separator ); while ( parameter != NULL ) { LOG_TO_FILE( ("%S", parameter) ) parameter = wcstok( NULL, separator ); if ( parameter != NULL ) LOG_TO_FILE( (",") ) }

The wcstok function is the wide-character version of the venerable strtok function. The code logs each parameter to a file, separated by the comma character. There is nothing particularly special about the comma—any character could have been used, but commas are standard delimiters.

Of the 11 files in the Include folder, only one needs to be edited. The basehlp.hpp file specifies the path and file name of the text file that the profiler writes the log data to. The original line was:

strcpy( logfile, "output.log" );

I replaced it with this:

time_t ltime; time( &ltime ); char buffer[20]; _i64toa( ltime, buffer, 10 ); strcpy( logfile, "C:\\Coverage\\FFdataFiles\\ffcover" ); strcat( logfile, buffer ); strcat( logfile, ".log" );

The new code gets a Unix-style timestamp and uses it to create a file name like ffcover0123456789.log so that multiple log files can reside in the same folder. The location of the files is hardcoded, which is usually a bad idea but in this case leads to a simpler design.

Building ProfilerFF.dll

Building a DLL of any size is a nontrivial task. The system in this article is predominantly command-line oriented, so I will build my profiling DLL from the command line. Because the Visual Studio .NET GUI environment is so powerful, it is quite possible that you have never built a program or DLL this way, so I will explain the process in detail. As you'll see, it is quite easy once you've seen an example.

Although it is possible to invoke the compiler (cl.exe) and linker (link.exe) directly from the command line, there are so many options available that developers frequently employ a utility program to manage all of them. The nmake.exe program uses a data file (usually with a .mak extension) to determine the recipe for creating your project using build tools. Again, it is possible to call the nmake.exe program directly but there are many user environment variables, such as PATH and INCLUDE, that need to be set, so it is common to write a .bat file to manage them. In short, this .bat file sets up environment variables and calls nmake.exe, which in turn reads compiler options from a .mak file then calls the compiler and linker to produce the resulting DLL.

I created the short .bat file named build.bat with the key statements shown in Figure 5. The first two SET statements tell the shell environment where required files such as mspdb70.dll and executables such as nmake.exe reside. You will probably have to modify the part of the .bat file that sets user environment variables, depending on the PATH variable your shell inherits from the system environment variables. Try to build the DLL, and if you get an error message about a missing file, find the file and add another SET path statement.

Figure 5 Build Script

rem ** build.bat set path=%path%;C:\Program Files\Microsoft Visual Studio .NET\Common7\IDE set path=%path%;C:\Program Files\Microsoft Visual Studio .NET\VC7\bin set include=%include%;C:\Program Files\Microsoft Visual Studio .NET\VC7\include set include=%include%;C:\Program Files\Microsoft Visual Studio .NET\VC7\PlatformSDK\Include set include=%include%;C:\Program Files\Microsoft Visual Studio .NET\FrameworkSDK\include set lib=%lib%;C:\Program Files\Microsoft Visual Studio .NET\VC7\PlatformSDK\lib set lib=%lib%;C:\Program Files\Microsoft Visual Studio .NET\VC7\lib nmake /nologo /a /f profilerFF.mak echo Profiler DLL for code coverage has been built

The next three SET statements tell the shell where to look for files such as cor.h that are found in the various C++ #include statements. The final two SET statements tell the shell where library files used by the linker are located. Again, you may have to modify or add statements depending on your environment.

Finally, nmake.exe is called with the compiler and linker options specified in ProfilerFF.mak. The ProfilerFF.mak data file consists of 14 file references such as $(INTDIR)\ProfilerFF.obj, which specifies the location and name of the intermediate .obj file created during compilation.

With everything ready to go, you can build the code coverage profiling DLL from a command shell using this command:

C:\Coverage\FFprofiler>build.bat

You can see the shell command in Figure 1. Because nmake.exe looks for its makefile in the current directory, you need to invoke build.bat from the FFprofiler folder. It is unlikely that the build will work on your first attempt, but a few edits to the build.bat file as I've described will lead to a successful build.

Preparing for Code Coverage

Before you enable the code coverage profiler, you need to specify which .NET assemblies to include in the coverage monitoring, and determine which methods in those assemblies you want to watch. The profiling DLL will create a list of all methods that were actually entered, and you can compare it to the list you provided to determine your code coverage percentage.

My system manages these lists by storing data in simple text files. The first step is to manually create an ffcoverAssemblies.txt file and save it in the Coverage\FFdataFiles folder. For the example shown in Figure 1, Figure 2, and Figure 3, the contents of this file are:

C:\Coverage\AppsToTest\DummyApp\bin\Debug\DummyApp.exe C:\Coverage\AppsToTest\DummyApp\DummyLib\bin\Debug\DummyLib.dll

These are the .NET assemblies that make up my system under test. Determining which assemblies are referenced in a large product can be a difficult task. You can use the .NET Framework Configuration tool, mscorcfg.msc, to determine assembly dependencies. You can also write a little utility program that uses the System.Assembly.GetReferencedAssemblies method to find dependencies.

After you have created the ffcoverAssemblies.txt file, the next step is to find all the methods, constructors, and properties in the assemblies. I put this code, along with code to do the reporting, in one project named FFcover. Here's where the powerful methods in the System.Reflection namespace come into play. The algorithm is shown in Figure 6 in C#-like pseudocode.

Figure 6 Find Existing Classes, Methods, Properties

open file ffcoverAssemblies.txt for reading open file ffcoverMethods.txt for writing for each assembly in ffcoverAssemblies.txt loop { get all classes in the assembly for each class loop { get all constructors for each constructor { get all parameters write assembly.class::constructor(params) to output } get all methods for each method { get the return type get all parameters write returnType assembly.class::method(params) } get all properties for each property { get the return type get all parameters write returnType assembly.class::property(params) } } // each class } // each assembly

In principle this is easy, but there are a few tricks along the way. I decided to read all of the assemblies into an ArrayList container and work from it, rather than use a file-oriented approach:

ArrayList al = new ArrayList(); // assemblies under coverage StreamReader sr = new StreamReader("C:\\Coverage\\FFdataFiles\\ffcoverAssemblies.txt"); string line; while ( (line = sr.ReadLine()) != null ) // read assemblies to test { Assembly a = Assembly.LoadFrom(line.Trim()); al.Add(a); // store assembly in array list al }

Notice that information such as file paths is hardcoded and error checking has been omitted; this has been done for simplicity and clarity only. In a production system, plenty of error checking would need to be added and the hardcoded strings should instead be passed as parameters.

I get the types (classes, enumerations, and arrays) in each assembly using the following loop:

foreach (Assembly a in al) { Type[] types = a.GetTypes(); •••

Then for each class I get the constructors, methods, and properties that belong to the class. Because the code for getting constructor, method, and property information from a class is similar, let's look at the code for methods only. You start by setting up a BindingFlags variable that acts as a filter for the kinds of methods you want to catch, as you can see in the following code:

foreach (Type type in types) { BindingFlags flags = BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.Static | BindingFlags.DeclaredOnly; •••

Depending on your system, you may want to modify these flags to include or exclude different kinds of methods. Next, you get all methods in the current class into an array:

MethodInfo[] methodinfos = type.GetMethods(flags);

You then iterate through each method, building up its signature in the following format:

return-type Assembly.Class::MethodName(param1,param2,. . . )

Starting with an empty string, you append the return type, a blank space, the full name of the Type (for example, the class name preceded by namespace), and the method name:

foreach (MethodInfo method in methodinfos) { string s = "" + method.ReturnType.FullName + " " + type.FullName + "::" + method.Name + "("; •••

After you get the method return type and name, you must get the parameter types. Remember that this is necessary because with overloading you can have multiple methods with the same name and return types, but with different parameter lists:

ParameterInfo[] parameterinfos = method.GetParameters(); foreach (ParameterInfo p in parameterinfos) { if (p.ParameterType.FullName != "System.Decimal" && p.ParameterType.FullName != "System.EventArgs") { s += p.ParameterType.FullName + ","; } }

During the development of this code coverage system I noticed that the coverage profiler does not capture the System.Decimal data type or System.EventArgs parameters. This required a design decision: I could either modify the profiler source code to catch these or modify the C# analysis code. I chose the second approach, mostly because I wanted to change as little as possible in the profiler. However, given more time it would have been better to change the profiler source code itself.

After the method string has been built up, there is a critical processing step:

s = Mapped(s); // map C# type names to Profiler type names

The entire string is modified using a Mapped function that replaces the .NET return type representation (System.Int32, for example) with the return type representation used by the profiler (int32). The two representations must match because the profiler outputs its own representation of data types for methods actually hit during testing. Here's the Mapped function:

static string Mapped(string s) { string t = s; while (t.IndexOf("System.Int32") >= 0) t = t.Replace("System.Int32", "int32"); // etc. while (t.IndexOf("System.String") >= 0) t = t.Replace("System.String", "String"); return t; }

The mappings I used handle the C# data types because the application under test was written in C#. Depending on the system you're analyzing, you may have to add mapping statements to map other data types. The technique I used here—a while loop—isn't very elegant. If you're a fan of regular expressions, you can recast the Mapped function using them instead. A table lookup would also be appropriate.

Finally, you strip off any trailing comma characters, append a closing parenthesis character, and write the result to the ffcoverMethods.txt file in the FFdataFiles folder:

if (s[s.Length-1] == ',') s = s.Remove(s.Length-1,1); s += ")"; Console.WriteLine(s); sw.WriteLine(s); sw.Flush();

Running Tests Under Code Coverage

After the code coverage profiling DLL has been built, and information has been gathered about which methods in the product should be hit while testing, you can run a test or a suite of tests under code coverage. The steps are easy: enable the code coverage environment, run a test or a suite of tests, and disable the code coverage environment.

The code coverage environment is activated with a simple enable.bat file that registers the profiler and sets three shell variables. The key statements in enable.bat are:

@set DBG_PRF_LOG=0x1 @set Cor_Enable_Profiling=0x1 @regsvr32 /s debug\ProfilerFF.dll @set COR_PROFILER={5AC86959-7927-4177-9802-A540F1967874}

The first set statement tells ProfilerFF.dll to send output to the log file specified in the ProfilerInfo.cpp file. Recall that this file is named ffcoverXXX.log, where the Xs are a timestamp. The second statement sets the value of the environment variable that the profiler checks to determine if it should profile or not (0x1 represents true). The third statement registers the profiling DLL. The last statement sets an ID value for the profiler that is run-ning in the current environment. It is the GUID specified in the ProfilerCallback.h file.

After calling enable.bat, you simply run a test or a suite of tests just as you would normally. Your tests can be manual or automated. All .NET activity is monitored and when the application terminates, the information about which methods were actually entered are saved as an ffcoverXXX.log file in folder FFdataFiles. Here are a few lines of code from the example shown at the beginning of this article:

0x00000c58;void DummyApp.Form1::Main() 0x00000c58;void DummyApp.Form1::button2_Click(Object) 0x00000c58;void DummyLib.Dummy::.ctor(int32,int64) 0x00000c58;void DummyLib.Dummy::Bar(int32&,float64&,String[])

Each line of data is preceded by an arbitrary thread ID in case your application is multithreaded. Notice that parameters such as int32 are in profiler format and not in .NET format (System.Int32).

If you run a second test, a second log file with a different timestamp will be saved. This is a common scenario in code coverage—you run a sequence of tests to obtain the cumulative coverage ratio. After you are finished running your tests, you want to turn off code coverage. You do this with a disable.bat file that has just one important statement:

@set Cor_Enable_Profiling=0x0

This statement effectively instructs the profiler not to monitor activity anymore. The relationship between the various parts of the Fundamental Function code coverage system are illustrated in the diagram in Figure 7.

Figure 7 Code Coverage System

Figure 7** Code Coverage System **

Reporting Code Coverage

After you have built a profiling DLL, created a list of all methods in the system under test that can potentially be entered during testing, and produced a list that contains all the methods that were actually entered, you are ready to create a results report. I organized my reporting code algorithm around three hash tables:

Hashtable ht1 = new Hashtable(); // holds all potential methods Hashtable ht2 = new Hashtable(); // holds methods actually hit Hashtable ht3 = new Hashtable(); // holds methods missed

In pseudocode, the algorithm looks like this:

load ht1 from ffcoverMethods.txt with methods that could be hit load ht2 from all *.log files with methods actually hit for each method in ht1 loop { if method is not in ht2, add method to ht3 } percent coverage = ht2.Count / ht1.Count

Loading the first hash table with names of methods that could be entered is easy because I already have this data in the ffcoverMethods.txt file. I simply open the file for reading, traverse through it, and add them like so:

if (!ht1.ContainsKey(line) ht1.Add(line, line);

Loading the second hash table with the methods that were actually entered during the test requires two steps. First I examine the FFdataFiles folder to find all files with names in the form of ffcoverXXX.log form, as shown here:

DirectoryInfo dir = new DirectoryInfo("C:\\Coverage\\FFdataFiles\\"); FileInfo[] files = dir.GetFiles("ff*.log"); foreach (FileInfo file in files) { al1.Add(file.FullName); }

I then take each file, open it for reading, and add data lines to the hash table. The main processing loop is shown in Figure 8.

Figure 8 Adding Methods Touched

foreach (string fn in al1) // for each log file { StreamReader sr2 = new StreamReader(fn); while ((line = sr2.ReadLine()) != null) // add methods hit to ht2 { if (!line.StartsWith("0x")) continue; string[] tokens = line.Split(';'); if (!ht2.ContainsKey(tokens[1])) ht2.Add(tokens[1], tokens[1]); } sr2.Close(); } // for each log file

Because I stored a thread ID along with method data, the StartsWith method makes it easy to tell when I've hit a line of data. When I'm at a data line, I use the String.Split method to parse out the method data from the thread ID, check to make sure that method hit is not already in the hash table, and then store it into the hash table.

Although not absolutely necessary, I like to determine which methods were not hit. This gives me feedback to help design new tests that should be added to increase coverage. To do this, I iterate through the first hash table of methods that could be hit and if any method is not in the hash table of methods actually hit, it must have been missed. Because hash tables are designed for quick data lookups, you might not have seen the code to traverse all data in a hash table before:

foreach(DictionaryEntry e in ht1) { if (!ht2.ContainsValue(e.Value)) // if not in hit table ht3.Add(e.Value, e.Value); // it was missed }

Finally, I do a little math to determine the percentage of methods hit, as shown in the following code:

int numtotal = ht1.Count; int numhit = ht2.Count; int nummissed = ht3.Count; double percent = (double)numhit/(double)numtotal * 100; percent = Math.Round(percent, 0);

I now have all the information I need to produce a code coverage analysis report.

Before and After .NET

If you know a little bit about code coverage in environments other than .NET, it's easier to understand how the Fundamental Function code coverage technique presented in this article fits into the overall production cycle. The main difficulty in traditional environments is that you cannot work with normal product code. You must first obtain a special build of the product you want to analyze. Then you must process that release, inserting code hooks to notify the profiler when methods or blocks of statements have been entered. This process is usually called instrumentation. This is not only a serious technical challenge, but on large products the management issues are difficult too.

Because all .NET code is executed in the CLR, you can trap method entries there using code built in the usual way, as demonstrated in this article. The overall result is that .NET code coverage is much easier to perform. In large projects before .NET, you would typically assign one full-time engineer just to manage the code coverage process, and even then results were inconsistent at best. The code coverage system presented in this article has been used successfully on several large .NET projects and is simple enough to be used by team members on an ad hoc basis.

In terms of the overall product development cycle, because Fundamental Function code coverage is so easy to perform, you can employ it earlier and more often in the development cycle. Pre-.NET code coverage is normally so time-consuming that it isn't feasible in early stages of development when the product is unstable (just when you need code coverage the most). Additionally, the simplicity allows daily code coverage analysis as part of the code check-in and build processes. In many product groups, before checking in new code, developers run a small set of developer regression tests to ensure that the new code has not broken old functionality. It is also very common for each new build of a product to be subjected to a set of build verification tests to ensure basic functionality. In both cases, these sets of tests can be analyzed with code coverage analysis to determine their effectiveness, even in a rapidly changing environment.

Compared with code coverage techniques before .NET, code coverage with Fundamental Function coverage does have some disadvantages. First, the coverage is not as granular as other techniques. I have found that this is not a serious drawback. By breaking up large methods into smaller methods (which is good design anyway), relatively little coverage information is lost. Additionally, I have noticed that the typical method in my .NET project tends to have approximately 25 percent fewer lines of code because of various efficiencies. Second, the ad hoc nature of the technique presented here means you don't get detailed reports. In my experience, this is a small price to pay for the greatly increased flexibility that this system provides. It's better to get very good code coverage today than very, very good code coverage tomorrow.

Conclusion

Even though code coverage analysis is a critical part of testing any significant software product, it is often so time-consuming in development environments before .NET that it is simply not done. Without a code coverage analysis, you can never be sure that your test cases touch all the product code. It is not unusual for a test team to believe they have full coverage, then run code coverage analysis and discover that their initial suite of tests hits less than 75 percent of the code base. In the old days, it may have been acceptable to push the product out the door and worry about additional testing later, but with the increased focus on security, this is no longer a viable option.

James McCaffrey works for Volt Information Sciences Inc., where he manages technical training for software engineers working at the Microsoft Redmond, WA campus. He has worked on several Microsoft products including Internet Explorer and MSN Search. James can be reached at jmccaffrey@volt.com or v-jammc@microsoft.com.

Additional resources