November 2015

Volume 30 Number 12

Test Run - The T-Test Using C#

By James McCaffrey

James McCaffreyThe t-test is one of the most fundamental forms of statistical analysis. Its goal is to determine whether the means (averages) of two sets of numbers are equal when you only have samples of the two sets. The idea is best explained by example. Suppose you’re investigating the mathematical ability of high school males and females in a large school district. The ability test is expensive and time-consuming so you can’t give the test to all the students. Instead, you randomly select a sample of 10 males and 10 females and give them the math test. From the sample results you can perform a t-test to infer whether the true average score of all the males is equal to the true average score of all the females.

There are many standalone tools, including Excel, which can perform a t-test. But if you want to integrate t-test functionality directly into a software system, using standalone tools can be awkward or impossible, and may involve copyright or other legal issues. This article explains how to perform a t-test using raw (no external libraries) C# code.

The best way to get a feel for what the t-test is and to see where this article is headed is to take a look at the demo program in Figure 1. The first data set is { 88, 77, 78, 85, 90, 82, 88, 98, 90 }. You can imagine these are the test scores of 10 males, where one of the males dropped out for some reason, leaving just nine scores.

Demo of the T-Test Using C#
Figure 1 Demo of the T-Test Using C#

The second data set is { 81, 72, 67, 81, 71, 70, 82, 81 }. You can imagine these are the test scores of 10 females, where two of the females dropped out for some reason, leaving just eight scores. The mean of the first data set is 86.22 and the mean of the second data set is 75.63, which suggests that the means of the two groups are not the same because there’s an almost 11-point difference. But even if the overall average scores of the two groups (all males and all females) were in fact the same, because only samples are being used, the difference in the sample averages could have happened by chance.

Using the two sample data sets, the demo program calculates a “t-statistic” (t) with value 3.4233 and a “degrees of freedom” (often abbreviated as df, or indicated by the lowercase Greek letter nu, ν) value of 14.937. Then, using the t and df values, a probability value (p-value) is calculated, with value 0.00379. There are several forms of the t-test. Perhaps the most common is called the Student t-test. The demo uses an improved variation called the Welch t-test.

The p-value is the probability that the true averages of the two populations (all males and females) are actually the same, given the sample scores and, therefore, that the observed difference of about 11 points was due to chance. In this case, the p-value is very small so you’d conclude that the true averages of all males and all females are not equal. In most problems, the critical p-value for comparison with the calculated p-value is arbitrarily defined to be 0.01 or 0.05.

Put somewhat differently, if the true average scores for all males and females were the same, the probability that you’d see the observed difference of nearly 11 points in the two sample averages of size nine and eight is only 0.00379—extremely unlikely.

This article assumes you have at least intermediate programming skills but doesn’t assume you know anything about the t-test. The demo is coded using C#, but you shouldn’t have much trouble if you want to refactor the code to another language, such as Visual Basic .NET or JavaScript.

Understanding the T-Distribution

The t-test is based on the t-distribution. And the t-distribution is closely related to the normal (also called Gaussian, or bell-shaped) distribution. The shape of a normal distribution set of data depends on both the mean and the standard deviation of the data. The standard deviation is a value that measures how spread out, or variable, the data is. A special case is when the mean (often indicated by Greek letter mu, µ) is 0 and the standard deviation (often abbreviated in English as sd, or indicated by Greek letter sigma, σ, is 1. The normal distribution with mean = 0 and sd = 1 is called the standard normal distribution. Its graph is shown in Figure 2.

The Standard Normal Distribution
Figure 2 The Standard Normal Distribution

In Figure 2, the equation that defines the standard normal distribution is called the probability density function. The t-distribution closely resembles the normal distribution. The shape of a t-distribution depends on a single value called the “degrees of freedom.” The t-distribution with df = 5 is shown in Figure 3.

In Figure 3, the equation that defines the t-distribution involves the Gamma function, which is indicated by the Greek capital letter gamma (Γ). In order to perform a t-test, you need to calculate and sum two identical areas under the curve of the t-distribution. This combined area is the p-value. For example, in Figure 3, if the value of t is 2.0, the combined areas under the curve you need are from -infinity to -2.0, and +2.0 to +infinity. In this case the combined area, which is the p-value, is 0.101939. For the demo program, when t = 3.4233, the combined area is 0.00379.

The T-Distribution
Figure 3 The T-Distribution

OK, but how can the area under the t-distribution be calculated? There are several approaches to this problem, but the most common technique is to calculate a single associated area under the curve of the standard normal distribution and use it to calculate the p-value. For example, in Figure 2, if z (the normal equivalent of t) has value -2.0, you can calculate the area from -infinity to -2.0, which is 0.02275. This area under the normal curve can then be used to calculate the corresponding area under the t-distribution.

To summarize, to perform a t-test you must calculate and then sum two (equal) areas under a t-distribution. This area is called the p-value. To do this, you can compute a single area under the standard normal distribution and then use that area to get the p-value.

Calculating the Area Under the Standard Normal Distribution

There are many ways to calculate the area under the standard normal distribution curve. This is one of the oldest problems in computer science. My preferred method is to use what’s called ACM algorithm #209. The Association for Computing Machinery (ACM) has published many fundamental algorithms for numerical and statistical computing.

A C# implementation of algorithm #209 is presented in Figure 4 as function Gauss. The function accepts a value, z, between -infinity and +infinity and returns a close approximation to the area under the standard normal distribution from -infinity to z.

Figure 4 Calculating the Area under the Standard Normal Distribution

public static double Gauss(double z)
{
  // input = z-value (-inf to +inf)
  // output = p under Standard Normal curve from -inf to z
  // e.g., if z = 0.0, function returns 0.5000
  // ACM Algorithm #209
  double y; // 209 scratch variable
  double p; // result. called 'z' in 209
  double w; // 209 scratch variable
  if (z == 0.0)
    p = 0.0;
  else
  {
    y = Math.Abs(z) / 2;
    if (y >= 3.0)
    {
      p = 1.0;
    }
    else if (y < 1.0)
    {
      w = y * y;
      p = ((((((((0.000124818987 * w
        - 0.001075204047) * w + 0.005198775019) * w
        - 0.019198292004) * w + 0.059054035642) * w
        - 0.151968751364) * w + 0.319152932694) * w
        - 0.531923007300) * w + 0.797884560593) * y * 2.0;
    }
    else
    {
      y = y - 2.0;
      p = (((((((((((((-0.000045255659 * y
        + 0.000152529290) * y - 0.000019538132) * y
        - 0.000676904986) * y + 0.001390604284) * y
        - 0.000794620820) * y - 0.002034254874) * y
        + 0.006549791214) * y - 0.010557625006) * y
        + 0.011630447319) * y - 0.009279453341) * y
        + 0.005353579108) * y - 0.002141268741) * y
        + 0.000535310849) * y + 0.999936657524;
    }
  }
  if (z > 0.0)
    return (p + 1.0) / 2;
  else
    return (1.0 - p) / 2;
}

Even a quick glance at the code in Figure 4 should convince you that using an existing algorithm, such as ACM #209, is much easier than coding your own implementation from scratch. An alternative to ACM #209 is to use a slight modification of equation 7.1.26 from “Handbook of Mathematical Functions” by Milton Abramowitz and Irene A. Stegun (Dover Publications, 1965).

Calculating the Area Under the T-Distribution

With an implementation of the Gauss function in hand, the area under the t-distribution can be calculated using ACM algorithm #395. A C# implementation of algorithm #395 is presented in Figure 5 as function Student. The function accepts a t value and a df value and returns the combined area from -infinity to t plus t to +infinity.

Figure 5 Calculating the Area under the t-Distribution

public static double Student(double t, double df)
{
  // for large integer df or double df
  // adapted from ACM algorithm 395
  // returns 2-tail p-value
  double n = df; // to sync with ACM parameter name
  double a, b, y;
  t = t * t;
  y = t / n;
  b = y + 1.0;
  if (y > 1.0E-6) y = Math.Log(b);
  a = n - 0.5;
  b = 48.0 * a * a;
  y = a * y;
  y = (((((-0.4 * y - 3.3) * y - 24.0) * y - 85.5) /
    (0.8 * y * y + 100.0 + b) + y + 3.0) / b + 1.0) *
    Math.Sqrt(y);
  return 2.0 * Gauss(-y); // ACM algorithm 209
}

Algorithm #395 has two forms. One form accepts the df parameter as an integer value and the second form accepts df as a type double value. In most statistics problems, the degrees of freedom is an integer value, but the Welch t-test uses a type double value.

The Demo Program

To create the demo program, I launched Visual Studio and created a new C# console application named TTest. The demo has no significant .NET version dependencies, so any version of Visual Studio should work. After the template code loaded into the editor, I deleted all using statements except for the single reference to the top-level System namespace. In the Solution Explorer window I renamed file Program.cs to TTestProgram.cs and allowed Visual Studio to automatically rename class Program for me.

The demo program is a bit too long to present in its entirety here, but you can find the complete source code in the file download that accompanies this article. The Main method begins by setting up and displaying the two sample datasets:

Console.WriteLine("\nBegin Welch's t-test using C# demo\n");
var x = new double[] { 88, 77, 78, 85, 90, 82, 88, 98, 90 };
var y = new double[] { 81, 72, 67, 81, 71, 70, 82, 81 };
Console.WriteLine("\nThe first data set (x) is:\n");
ShowVector(x, 0);
Console.WriteLine("\nThe second data set (y) is:\n");
ShowVector(y, 0);

All of the work is performed in a method named TTest:

Console.WriteLine("\nStarting Welch's t-test using C#\n");
TTest(x, y);
Console.WriteLine("\nEnd t-test demo\n");
Console.ReadLine();

The definition of method TTest begins by summing the values in each dataset:

public static void TTest(double[] x, double[] y)
{
  double sumX = 0.0;
  double sumY = 0.0;
  for (int i = 0; i < x.Length; ++i)
    sumX += x[i];
  for (int i = 0; i < y.Length; ++i)
    sumY += y[i];
...

Next, the sums are used to calculate the two sample means:

int n1 = x.Length;
int n2 = y.Length;
double meanX = sumX / n1;
double meanY = sumY / n2;

Next, the two means are used to calculate the two sample variances:

double sumXminusMeanSquared = 0.0; // Calculate variances
double sumYminusMeanSquared = 0.0;
for (int i = 0; i < n1; ++i)
  sumXminusMeanSquared += (x[i] - meanX) * (x[i] - meanX);
for (int i = 0; i < n2; ++i)
  sumYminusMeanSquared += (y[i] - meanY) * (y[i] - meanY);
double varX = sumXminusMeanSquared / (n1 - 1);
double varY = sumYminusMeanSquared / (n2 - 1);

The variance of a set of data is the square of the standard deviation, so the standard deviation is the square root of the variance and the t-test works with variances. Next, the t statistic is calculated:

double top = (meanX - meanY);
double bot = Math.Sqrt((varX / n1) + (varY / n2));
double t = top / bot;

In words, the t statistic is the difference between the two sample means, divided by the square root of the sum of the variances divided by their associated sample sizes. Next, the degrees of freedom is calculated:

double num = ((varX / n1) + (varY / n2)) *
  ((varX / n1) + (varY / n2));
double denomLeft = ((varX / n1) * (varX / n1)) / (n1 - 1);
double denomRight = ((varY / n2) * (varY / n2)) / (n2 - 1);
double denom = denomLeft + denomRight;
double df = num / denom;

The calculation of the degrees of freedom for the Welch t-test is somewhat tricky and the equation isn’t at all obvious. Fortunately, you’ll never have to modify this calculation. Method TTest concludes by computing the p-value and displaying all the calculated values:

...
  double p = Student(t, df); // Cumulative two-tail density
  Console.WriteLine("mean of x = " + meanX.ToString("F2"));
  Console.WriteLine("mean of y = " + meanY.ToString("F2"));
  Console.WriteLine("t = " + t.ToString("F4"));
  Console.WriteLine("df = " + df.ToString("F3"));
  Console.WriteLine("p-value = " + p.ToString("F5"));
  Explain();
}

The program-defined method named Explain displays information explaining the interpretation of the p-value, as shown in Figure 1.

A Few Comments

There are actually several different kinds of statistics problems that involve the t-test. The type of problem described in this article is sometimes called an unpaired t-test because there’s no conceptual connection between the data values in each sample dataset. Another type of t-test is called a paired sample test, which might be used when you have some sort of before and after data, such as a test score before some instruction followed by a test score after the instruction. Here, each pair of scores is conceptually related.

The Welch t-test presented here is superior to the more common Student t-test in most scenarios. The Student t-test generally requires an equal number of data points in each of the two sample datasets, and requires that the variances of the two samples be approximately equal. The Welch t-test can work with unequal sample sizes and is robust even when sample variances differ.

The type of t-test explained in this article is called a two-tailed test. This is more or less synonymous with a problem where the goal is to determine whether two group means are the same. A one-tailed t-test can be used in situations where the goal is to determine if the mean of one group is greater than the mean of the second group. When performing a one-tailed t-test, you divide the two-tailed p-value by 2.

You should be very conservative when interpreting the results of a t-test. A conclusion along the lines of, “Based on a calculated t-test p-value of 0.008 I conclude it is unlikely that the true population means of males and females are the same” is much better than, “The p-value of 0.008 means the average scores of males are greater than those of females.”

An alternative to the t-test is called the Mann-Whitney U test. Both techniques infer whether two population means are equal or not based on samples, but the Mann-Whitney U test makes fewer statistical assumptions, which leads to more conservative conclusions (you’re less likely to conclude the means under investigation are different).

The t-test is limited to situations where there are two groups. For problems examining the means of three of more groups, you’d use an analysis called the F-test.


Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Internet Explorer and Bing. Dr. McCaffrey can be reached at jammc@microsoft.com.

Thanks to the following technical expert at Microsoft Research for reviewing this article: Kirk Olynyk