February 2014

Volume 29 Number 2

Data Points : Data, Meet My New Friend, F#

Julie Lerman

Julie LermanI’ve had some exposure to functional programming over the past few years. Some of this was implicit: Coding with lambdas in LINQ is functional programming. Some has been explicit: Using Entity Framework and the LINQ to SQL CompiledQuery type forced me to use a .NET Func delegate. That was always a little tricky because I did it so infrequently. I’ve also been exposed to functional programming through the infectious enthusiasm of Rachel Reese, a Microsoft MVP, who not only participates in my local user group (VTdotNET) but also runs the VTFun group here in Vermont, which focuses on many aspects and languages of functional programming. The first time I went to a VTFun meeting, it was filled with mathematicians and rocket scientists. I’m not kidding. One of the regulars is a condensed matter theorist at University of Vermont. Swoon! I was a little overwhelmed by the high-level discussions, but it was fun to actually feel like the dummy in the room. Most of what was said went right over my head—except for a single statement that caught my attention: “Functional languages don’t need no stinkin’ foreach loops.” Whoa! I wanted to know what that meant. Why don’t they need foreach loops?

What I’ve usually heard is that functional programming is great for mathematical operations. Not being a math geek, I interpreted that to mean “for demos that use the Fibonacci sequence to test out asynchronous behavior” and didn’t pay much more attention. That’s the problem with hearing just a short elevator pitch about functional programming.

But I finally started hearing a more accurate characterization—that functional programming is great for data science. That certainly appeals to a data geek. F#, the functional language of the Microsoft .NET Framework, brings all kinds of data science capabilities to .NET developers. It has entire libraries dedicated to charting, time manipulation and set operations. It has APIs with logic dedicated to 2D, 3D and 4D arrays. It comprehends units of measure and is able to constrain and validate based on specified units.

F# also enables interesting coding scenarios in Visual Studio. Rather than building up your logic in code files and then debugging, you can write and execute code line by line in an interactive window and then move the successful code into a class file. You can get a great feel for F# as a language with Tomas Petricek’s article, “Understanding the World with F#” (bit.ly/1cx3cGx). By adding F# into the .NET toolset, Visual Studio becomes a powerful tool for building applications that perform data science logic.

In this article, I’ll focus on one aspect of F# and functional programming that I’ve come to understand since hearing that comment about not needing foreach loops. Functional languages are really good at working with sets of data. In a procedural language, when you work with sets, you have to explicitly iterate through them to perform logic. A functional language, in contrast, comprehends sets on a different level, so you just need to ask it to perform a function on a set, rather than loop through the set and perform a function on each item. And that function can be defined with lots of logic, including math, if necessary.

LINQ provides shortcuts for this, even a ForEach method into which you can pass a function. But in the background, your language (perhaps C# or Visual Basic) simply translates this into a loop.

A functional language such as F# has the ability to perform the set function at a lower level, and it’s much quicker thanks to its easy parallel processing. Add to this other key benefits, such as a rich math-processing ability and an incredibly detailed typing system that even understands units of measure, and you’ve got a powerful tool for performing calculations on large sets of data.

There’s much more to F# and other functional languages that’s still far beyond my reach. But what I want to do in this column is focus on a way to quickly benefit from one particular aspect of functional languages without making a huge investment: moving logic from the database into my application. This is the way I like to learn: find a few things I can understand and use them to take some baby steps into a new platform, language, framework or other type of tool.

I’ve heard Reese try to make it clear to developers that using F# doesn’t mean switching development languages. In the same way you might use a LINQ query or a stored procedure to solve a particular problem, you can create a library of F# methods to solve the kinds of problems in your app that functional languages are really good at.

What I’ll focus on here is extracting business logic that was built into my database, logic for handling large sets of data—something at which the database is excellent—and replacing it with functional methods.

And because F# is designed to work with sets and is really clever with mathematical functions, the code can be more efficient than it might be in SQL or in procedural languages such as C# or Visual Basic. It’s so easy to have F# execute the logic on the items in the set in parallel. Not only can this reduce the amount of code you’d likely need in a procedural language to emulate this behavior, the parallelization means the code will run much faster. You could design your C# code to run in parallel, but I’d rather not go through that effort, either.

A Real-World Problem

Many years ago, I wrote a Visual Basic 5 app that had to collect, maintain, and report a lot of scientific data and perform a lot of calculations. Some of those calculations were so complex I sent them to an Excel API.

One of the calculations involved determining the pounds per square inch (PSI) based on the amount of weight that caused a chunk of material to break. The chunk could be any one of a number of cylindrical shapes and sizes. The app would use the measurements of the cylinder and, depending on its shape and size, a specific formula to calculate its area. It would then apply a relevant tolerance factor and, finally, the amount of weight it took to break the cylinder. All of this together provided the PSI for the particular material being tested.

In 1997, leveraging the Excel API to evaluate the formula from within Visual Basic 5 and Visual Basic 6 felt like a pretty clever solution.

Moving the Eval

Years later, I revamped the application in .NET. At that time, I decided to take advantage of the power of SQL Server to execute the PSI calculation on large sets of cylinders after they were updated by a user, rather than having the user’s computer spend time on all of those calculations. That worked out pretty well.

More years passed and my ideas about business logic in the database changed. I wanted to pull that calculation back to the client side and, of course, the client machines were faster by then, anyway. It wasn’t too difficult to rewrite the logic in C#. After a user updated a series of cylinders with the weight it took to break them (the load, represented in pounds), the app would iterate through the updated cylinders and calculate the PSI. Then I could update cylinders in the database with their new loads and PSI values.

For the sake of comparing familiar C# to the final outcome in F# (which you’ll see shortly), I’ve provided the listing of the cylinder type, CylinderMeasurements, in Figure 1 and my C# Calculator class in Figure 2, so you can see how I derive the PSIs for a set of cylinders. It’s the CylinderCalculator.UpdateCylinders method that’s called to start the PSI calculation for a set of cylinders. It iterates through each cylinder in the set and performs the appropriate calculations. Note that one of the methods, GetAreaForCalculation, is dependent on the cylinder type because I calculate the area of the cylinder using the appropriate formula.

Figure 1 CylinderMeasurement Class

public class CylinderMeasurement
{
  public CylinderMeasurement(double widthA, double widthB,
    double height)
  {
    WidthA = widthA;
    WidthB = widthB;
    Height = height;
  }
  public int Id { get; private set; }
  public double Height { get; private set; }
  public double WidthB { get; private set; }
  public double WidthA { get; private set; }
  public int LoadPounds { get; private set; }
  public double Psi { get; set; }
  public CylinderType CylinderType { get; set; }
  public void UpdateLoadPoundsAndTypeEnum(int load, 
    CylinderType cylType) {
    LoadPounds = load; CylinderType = cylType;
  }
   private double? Ratio {
    get {
      if (Height > 0 && WidthA + WidthB > 0) {
        return Math.Round(Height / ((WidthA + WidthB) / 2), 2);
      }
      return null;
    }
  }
  public double ToleranceFactor {
    get {
      if (Ratio > 1.94 || Ratio < 1) {
        return 1;
      }
      return .979;
    }
  }
}

Figure 2 Calculator Class to Calculate PSI

public static class CylinderCalculator
  {
    private static CylinderMeasurement _currentCyl;
    public static void UpdateCylinders(IEnumerable<CylinderMeasurement> cyls) {
      foreach (var cyl in cyls)
      {
        _currentCyl = cyl;
        cyl.Psi = GetPsi();
      }
    }
    private static double GetPsi() {
      var area = GetAreaForCylinder();
      return PsiCalculator(area);
    }
    private static double GetAreaForCylinder() {
      switch (_currentCyl.CylinderType)
      {
        case CylinderType.FourFourEightCylinder:
          return 3.14159*((_currentCyl.WidthA + _currentCyl.WidthB)/2)/2*
            ((_currentCyl.WidthA + _currentCyl.WidthB)/2/2);
        case CylinderType.SixSixTwelveCylinder:
          return 3.14159*((_currentCyl.WidthA + _currentCyl.WidthB)/2)/2*
            ((_currentCyl.WidthA + _currentCyl.WidthB)/2/2);
        case CylinderType.ThreeThreeSixCylinder:
          return _currentCyl.WidthA*_currentCyl.WidthB;
        case CylinderType.TwoTwoTwoCylinder:
          return ((_currentCyl.WidthA + _currentCyl.WidthB)/2)*
            ((_currentCyl.WidthA + _currentCyl.WidthB)/2);
        default:
          throw new ArgumentOutOfRangeException();
      }
    }
    private static int PsiCalculator(double area) {
      if (_currentCyl.LoadPounds > 0 && area > 0)
      {
        return (int) (Math.Round(_currentCyl.LoadPounds/area/1000*
          _currentCyl.ToleranceFactor, 2)*1000);
      }
      return 0;
    }
  }

Data Focus and Faster Processing with F#

Finally, I discovered that F#, thanks to its natural inclination for manipulating data, provides a much better solution than evaluating one formula at a time.

At the introductory session on F# given by Reese, I explained this problem, which had been nagging at me for so many years, and asked if a functional language could be used to solve it in a more satisfying way. She confirmed that I could apply my complete calculation logic on a full set and let F# derive the PSIs for many cylinders in parallel. I could get the client-side functionality and a performance boost at the same time.

The key for me was to realize I could use F# to solve a particular problem in much the same way I use a stored procedure—it’s simply another tool in my tool belt. It doesn’t require giving up my investment in C#. Perhaps there are some who are just the reverse—writing the bulk of their apps in F# and using C# to attack particular problems. In any case, using the C# CylinderCalculator as a guide, Reese created a small F# project that did the task and I was able to replace a call to my calculator with a call to hers in my tests, as shown in Figure 3.

Figure 3 The F# PSI Calculator

module calcPsi =
  let fourFourEightFormula WA WB = 3.14159*((WA+WB)/2.)/2.*((WA+WB)/2./2.)
  let sixSixTwelveFormula WA WB = 3.14159*((WA+WB)/2.)/2.*((WA+WB)/2./2.)
  let threeThreeSixFormula (WA:float) (WB:float) = WA*WB
  let twoTwoTwoFormula WA WB = ((WA+WB)/2.)*((WA+WB)/2.)
  // Ratio function
  let ratioFormula height widthA widthB =
    if (height > 0. && (widthA + widthB > 0.)) then
      Some(Math.Round(height / ((widthA + widthB)/2.), 2))
    else
      None
  // Tolerance function
  let tolerance (ratioValue:float option) = match ratioValue with
    | _ when (ratioValue.IsSome && ratioValue.Value > 1.94) -> 1.
    | _ when (ratioValue.IsSome && ratioValue.Value < 1.) -> 1.
    | _ -> 0.979
  // Update the PSI, and return the original cylinder information.
  let calculatePsi (cyl:CylinderMeasurement) =
    let formula = match cyl.CylinderType with
      | CylinderType.FourFourEightCylinder -> fourFourEightFormula
      | CylinderType.SixSixTwelveCylinder -> sixSixTwelveFormula
      | CylinderType.ThreeThreeSixCylinder -> threeThreeSixFormula
      | CylinderType.TwoTwoTwoCylinder -> twoTwoTwoFormula
      | _ -> failwith "Unknown cylinder"
    let basearea = formula cyl.WidthA cyl.WidthB
    let ratio = ratioFormula cyl.Height cylinder.WidthA cyl.WidthB
    let tFactor = tolerance ratio
    let PSI = Math.Round((float)cyl.LoadPounds/basearea/1000. * tFactor, 2)*1000.
    cyl.Psi <- PSI
    cyl
  // Map evaluate to handle all given cylinders.
  let getPsi (cylinders:CylinderMeasurement[])
              = Array.Parallel.map calculatePsi cylinders

If, like me, you’re new to F#, you might look only at the amount of code and see no point in choosing this route over C#. Upon closer examination, however, you may appreciate the terseness of the language, the ability to define the formulas more elegantly and, in the end, the simple way I can apply the calculatePsi function Reese defined to the array of cylinders I passed to the method.

The terseness is thanks to the fact that F# is better designed to perform math functions than C# is, so defining those functions is more efficient. But besides the geek appeal of the language, I was interested in performance. When I increased the number of cylinders per set in my tests, initially I did not see a performance improvement over C#. Reese explained that the test environment is more expensive when using F#. So I then tested the performance in a console app using the Stopwatch to report the time passed. The app built up a list of 50,000 cylinders, started the Stopwatch, passed the cylinders to either the C# or F# calculator to update the PSI value for each cylinder, then stopped the Stopwatch when the calculations were complete.

In most of the cases, the C# process took about three times longer than the F# process, although about 20 percent of the time C# would beat F# by a small margin. I can’t account for the oddity, but it’s possible there’s more I need to understand to perform truer profiling.

Keeping an Eye Out for Logic That Begs for a Functional Language

So while I have to work at my F# skills, my new understanding is going to apply nicely to apps I already have in production as well as future apps. In my production apps, I can look at business logic I’ve relegated to the database and consider if an app would benefit from an F# replacement. With new apps, I now have a keener eye for spotting functionality I can code more efficiently with F#, performing data manipulation, leveraging strongly typed units of measurement and gaining performance. And there’s always the fun of learning a new language and finding just the right scenarios for which it was built!


Julie Lerman is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other Microsoft .NET topics at user groups and conferences around the world. She blogs at thedatafarm.com/blog and is the author of “Programming Entity Framework” (2010) as well as a Code First edition (2011) and a DbContext edition (2012), all from O’Reilly Media. Follow her on Twitter at twitter.com/julielerman and see her Pluralsight courses at juliel.me/PS-Videos.

Thanks to the following technical expert for reviewing this article: Rachel Reese (Firefly Logic)
Rachel Reese is a long-time software engineer and math geek in Nashville, TN. Until recently, she ran the Burlington, VT functional programming user group, @VTFun, which was a constant source of inspiration to her, and at which she often spoke on F#. She's also an ASPInsider, an F# MVP, a community enthusiast, one of the founding @lambdaladies, and a Rachii. You can find her on twitter, @rachelreese, or on her blog: rachelree.se.