August 2012

Volume 27 Number 08

C++ - Functional-Style Programming in C++

By David Cravey | August 2012

C++ is a multiparadigm, systems-level language that provides high-level abstractions with very low (often zero) runtime cost. The paradigms commonly associated with C++ include procedural, object-oriented and generic programming. Because C++ provides excellent tools for high-level programming, even functional-style programming is quite reasonable.

By functional-style programming, I don’t mean the programming is strictly functional, just that it’s easy to use many of the functional building blocks in C++. This article will focus on one of the most important functional programming constructs: working with values as opposed to identities. I’ll discuss the strong support C++ has always had for working with values, then show how the new C++ 11 standard expands this support with lambdas. Finally, I’ll introduce a method of working with immutable data structures that maintains the speed C++ is known for while providing the protection that functional languages have long enjoyed.

Values vs. Identities

Let me first explain what I mean by working with values rather than identities. Simple values such as 1, 2 and 3 are easy to identify. I could also say that 1, 2 and 3 are constant integer values. This would be redundant, however, because all values are actually constants and the values themselves never change (that is, 1 is always 1 and 1 will never be 2). On the other hand, the value associated with an identity may change (x might equal 1 now, but it could equal 2 later).

Unfortunately, it’s easy to confuse values and value types. Value types are passed around by value rather than by reference. Though I want to focus here on the values and not the mechanism involved in using or copying them, it’s useful to see how value types go part way in preserving the concept of values versus identities.

The code in Figure 1 demonstrates a simple use of a value type.

Figure 1 Using a Value Type

void Foo()
{
  for (int x = 0; x < 10; ++x)
  {
    // Call Bar, passing the value of x and not the identity
    Bar(x);
  }
}
void Bar(int y)
{
  // Display the value of y
  cout << y << " ";
  // Change the value that the identity y refers to
  // Note: This will not affect the value that the variable x refers to
  y = y + 1;
}
// Outputs:
// 0 1 2 3 4 5 6 7 8 9

With only a small change, the variable y can become a reference type—which drastically changes the relationship between x and y, as shown in Figure 2.

Figure 2 Using a Reference Type

void Foo()
{
  for (int x = 0; x < 10; ++x)
  {
    // Call Bar, passing the identity of x
    Bar(x);
  }
}
void Bar(int& y)
{
  // Display the value of y
  cout << y << " ";
  // Change the value that the identity y refers to
  // Note: This will affect the variable x
  y = y + 1;
}
// Outputs:
// 0 2 4 6 8

As Figure 3 shows, C++ also provides the const modifier, which prevents the programmer from making changes to a variable and thus further preserves the concept of a value. (As with most things in C++, however, there’s at least one way to defeat that protection. For more information, look up const_cast, which is intended for working with older code that isn’t “const correct.”)

Figure 3 The const Modifier

void Foo()
{
  for (int x = 0; x < 10; ++x)
  {
    // Call Bar, passing the identity of x,
    // yet the value of x will be protected by the const
    Bar(x);
  }
}
void Bar(const int& y)
{
  // Use the value of y
  cout << y << " ";
  // Attempt to change the value of what the identity y refers to
  y = y + 1; // This line will not compile because y is const!
}

Note in Figure 3 that though y is passed by reference, the value of y is protected at compile time by a const modifier. This gives C++ programmers an efficient method of passing large objects while working with their values as opposed to their identities.

With the const modifier, C++ has immutable data types that resemble those found in most functional programming languages. However, dealing with these immutable data types is difficult. Furthermore, making deep (full) copies of large objects for every small change isn’t efficient. Nonetheless, it should be clear that standard C++ has always had a concept of working with values (even if it’s not a very pure concept).

Note that the support for value types extends to user-defined types through copy constructors and assignment operators. C++ copy constructors and assignment operators allow user-defined types to make a deep copy of the object. Keep in mind that while C++ copy constructors can be implemented to make a shallow copy, you’ll have to make sure the value semantics are preserved.

C++ 11 Support for Functional-Style Programming

C++ 11 brings a number of new tools for functional-style programming. Perhaps most important, C++ now has support for lambdas (also known as closures or anonymous functions). Lambdas allow you to compose your code in ways that wouldn’t have been practical before. This functionality was previously available through functors, which are powerful but less practical to use. (Actually, C++ lambdas write anonymous functors behind the scenes.) Figure 4 shows how lambdas have improved our code with a simple example that uses the C++ standard library (STL).

Figure 4 Using Lambdas

void Foo()
{
  vector<int> v;
  v.push_back(1);
  v.push_back(2);
  v.push_back(3);
  for_each(begin(v), end(v), [](int i) {
    cout << i << " ";
  });
}
// Outputs:
// 1 2 3

In this case, the for_each function applies a lambda to each element of a vector. It’s important to note that C++ lambdas have been designed to be used inline when possible; thus lambdas can run as fast as handcrafted code.

While C++ is just one of the many imperative languages that now have lambdas, what makes C++ lambdas special is that (similar to functional programming languages) they can preserve the concept of working with values as opposed to identities. While functional programming languages accomplish this by making variables immutable, C++ does it by providing control over the capture. Consider the code in Figure 5.

Figure 5 Capturing by Reference

void Foo()
{
  int a[3] = { 11, 12, 13 };
  vector<function<void(void)>> vf;
  // Store lambdas to print each element of an array
  int ctr;
  for (ctr = 0; ctr < 3; ++ctr) {
    vf.push_back([&]() {
      cout << "a[" << ctr << "] = " << a[ctr] << endl;
    });   
  }
  // Run each lambda
  for_each(begin(vf), end(vf), [](function<void(void)> f) {
    f();
  });
}
// Outputs:
// a[3] = -858993460
// a[3] = -858993460
// a[3] = -858993460

In this code, everything is captured by reference, which is the standard behavior for lambdas in other languages. Yet capturing by reference complicates things unless the variables being captured are immutable. If you’re new to working with lambdas, you probably expect the following output from the code:

a[0] = 11
a[1] = 12
a[2] = 13

However, that’s not the output you get—and the program might even crash. This is because the variable ctr is captured by reference, so all of the lambdas use the final value of ctr (that is, 3, the value that made the for loop come to an end) and then access the array beyond its bounds.

It’s also worth noting that to keep the ctr variable alive to be used by the lambda outside of the for loop, the ctr variable’s declaration has to be lifted out of the for loop. While some languages eliminate the need to lift value type variables to an appropriate scope, that doesn’t really solve the problem, which is that the lambda needs to use the value of ctr as opposed to the identity of the variable ctr. (There are workarounds for other languages that involve making an explicit copy to a temporary variable. However, this makes it a bit unclear as to what’s going on, and it’s error-prone because the original variable is also captured and thus is still available for use.)

As Figure 6 shows, C++ provides a simple syntax to allow easy control of the lambda’s capture, which preserves the concept of working with values.

Figure 6 C++ Syntax for Controlling Lambda Capture

[]

Don’t capture anything

(exactly what I wanted in the first lambda example)

[&]

Capture everything by reference

(traditional lambda behavior, though not consistent with functional programming’s emphasis on values)

[=]

Capture everything by value

(while this preserves the concept of values, it limits the usefulness of the lambdas; also, it can be expensive to copy large objects)

[&ctr] Capture only ctr, and capture ctr by reference
[ctr] Capture only ctr, and capture ctr by value
[&,ctr] Capture ctr by value and everything else by reference
[=,&v] Capture v by reference and everything else by value
[&, ctr1, ctr2] Capture ctr1 and ctr2 by value and everything else by reference

It’s clear from Figure 6 that the programmer has complete control over how the lambda captures variables and values. However, while this preserves the concept of working with values, it does nothing to make working with complex data structures as values efficient.

Immutable Data Types

What’s missing are the efficient immutable data structures that some functional programming languages have. These languages facilitate immutable data structures that are efficient even when very large because they share common data. Creating data structures in C++ that share data is trivial—you just dynamically allocate data and each data structure has pointers to the data. Unfortunately, it’s more difficult to manage the lifetime of shared variables (for this reason, garbage collectors have become popular). Luckily, C++ 11 provides an elegant solution for working with shared variables through the std::shared_ptr template class, as shown in Figure 7.

Figure 7 Sharing Variables

void Foo()
{
  // Create a shared int
  // (dynamically allocates an integer
  //  and provides automatic reference counting)
  auto sharedInt = make_shared<int>(123);
  // Share the integer with a secondShare
  // (increments the reference count)
  shared_ptr<int> secondShare(sharedInt);
  // Release the pointer to the first integer
  // (decrements the reference count)
  sharedInt = nullptr;
  // Verify the shared int is still alive
  cout << "secondShare = " << *secondShare << endl;
  // Shared int is automatically de-allocated
  // as secondShare falls out of scope and the reference
  // count falls to zero
}
// Outputs:
// secondShare = 123

The code in Figure 7 illustrates a simple use of std::shared_ptr and its helper function std::make_shared. Using std::shared_ptr makes it easy to share data among data structures without fear of leaking memory (as long as circular references are avoided). Note that std::shared_ptr provides the basic thread-safety guarantees, and runs fast because it uses a lock-free design. However, keep in mind that the basic thread-safety guarantee that std::shared_ptr provides doesn’t automatically extend to the object to which it’s pointing. Still, std::shared_ptr guarantees it will not reduce the thread-safety guarantee of the object it points to. Immutable objects inherently provide a strong thread-safety guarantee because once they’re created they never change. (Actually, they never change in an observable manner, which includes, among other things, an appropriate thread-safety guarantee.) Therefore, when you use a std::shared_ptr with an immutable object, the combination maintains the immutable object’s strong thread-safety guarantee.

I can now easily create a simple immutable class that potentially shares data, as shown in Figure 8.

Figure 8 An Immutable Class for Sharing Data

class Immutable
{
private:
  // Use a normal double, copying is cheap
  double d_;
  // Use a shared string, because strings can be very large
  std::shared_ptr<std::string const> s_;
public:
  // Default constructor
  Immutable()
    : d_(0.0),
      s_(std::make_shared<std::string const>(""))
  {}
  // Constructor taking a string
  Immutable(const double d, const string& s)
    : d_(d),
    s_(std::make_shared<std::string const>(s))
  {}
  // Move constructor
  Immutable(Immutable&& other)
    : s_()
  {
    using std::swap;
    swap(d_, other.d_);
    swap(s_, other.s_);
  }
  // Move assignment operator
  Immutable& operator=(Immutable&& other)
  {
    swap(d_, other.d_);
    swap(s_, other.s_);
    return *this;
  }
  // Use default copy constructor and assignment operator
  // Getter Functions
  double GetD() const
  {
    // Return a copy because double is small (8 bytes)
    return d_;
  }
  const std::string& GetS() const
  {
    // Return a const reference because string can be very large
    return *s_;
  }
  // "Setter" Functions (always return a new object)
  Immutable SetD(double d) const
  {
    Immutable newObject(*this);
    newObject.d_ = d;
    return newObject;
  }
  Immutable SetS(const std::string& s) const
  {
    Immutable newObject(*this);
    newObject.s_ = std::make_shared<std::string const>(s);
    return newObject;
  }
};

The code in Figure 8 is a bit long, but most of it is boilerplate code for the constructors and assignment operators. The last two functions are the key to making the object immutable. Note that the SetS and SetD methods return a new object, which leaves the original object unchanged. (While including the SetS and SetD methods as members is convenient, it’s a bit of a lie, because they don’t actually change the original object. For a cleaner solution, see the ImmutableVector in Figures 9 and 10.) Figure 11 shows the Immutable class in action.

Figure 9 Using the Smart ImmutableVector Template Class

template <class ImmutableVector>
void DisplayImmutableVector(const char* name, const ImmutableVector& v)
{
  using namespace std;
  cout << name << ".Size() = " << v.Size()
     << ", " << name << "[] = { ";
  for (size_t ctr = 0; ctr < v.Size(); ++ctr) {
    cout << v[ctr] << " ";
  }
  cout << "}" << endl;
}
void ImmutableVectorTest1()
{
  // Create an ImmutableVector with a branching size of four
  ImmutableVector<int, 4> v;
  // Another ImmutableVector (we will take a copy of v at element 6)
  ImmutableVector<int, 4> aCopyOfV;
  // Push 16 values into the vector (this will create a two level tree).
  // Note that the vector is being assigned to itself. The
  // move constructor insures this is not very expensive, but
  // if a copy was made at any point the copy would remain
  // unchanged, but continue to share the applicable data with
  // the current version.
  for (int ctr = 0; ctr < 10; ++ctr) {
    v = AppendValue(v, ctr);
    if (ctr == 6) aCopyOfV = v;
  }
  // Display the contents of the vectors
  DisplayImmutableVector("v", v);
  DisplayImmutableVector("aCopyOfV", aCopyOfV);
}
// Outputs:
// v.Size() = 10, v[] = { 0 1 2 3 4 5 6 7 8 9 }
// aCopyOfV.Size() = 7, aCopyOfV[] = { 0 1 2 3 4 5 6 }

Figure 10 Methods for Operating on the ImmutableVector

void ImmutableVectorTest2()
{
  ImmutableVector<int, 4> v;
  v = AppendValue(v, 1);
  v = AppendValue(v, 2);
  v = AppendValue(v, 3);
  int oldValue = v.Back();
  auto v1 = TruncateValue(v);
  auto v2 = SubstituteValueAtIndex(v, 0, 3);
  auto v3 = GenerateFrom(v, [](ImmutableVector<int, 4>::MutableVector& v) {
    v[0] = 4;
    v[1] = 5;
    v[2] = 6;
    v.PushBack(7);
    v.PushBack(8);
  });
  auto v4 = GenerateFrom(v3, [](ImmutableVector<int, 4>::MutableVector& v4) {
    using namespace std;
    cout << "Change v4 by calling PopBack:" << endl;
    cout << "x = v4.PopBack()" << endl;
    int x = v4.PopBack();
    cout << "x == " << x << endl;
    cout << endl;
  });
  // Display the contents of the vectors
  DisplayImmutableVector("v", v);
  DisplayImmutableVector("v1", v1);
  DisplayImmutableVector("v2", v2);
  DisplayImmutableVector("v3", v3);
  DisplayImmutableVector("v4", v4);
}
// Outputs:
//    Change v4 by calling PopBack:
//    x = v4.PopBack()
//    x == 8
//   
//    Resulting ImmutableVectors:
//    v.Size() = 3, v[] = { 1 2 3 }
//    v1.Size() = 2, v1[] = { 1 2 }
//    v2.Size() = 3, v2[] = { 3 2 1 }
//    v3.Size() = 5, v3[] = { 4 5 6 7 8 }
//    v4.Size() = 4, v4[] = { 4 5 6 7 }

Figure 11 The Immutable Class in Action

using namespace std;
void Foo()
{
  // Create an immutable object
  double d1 = 1.1;
  string s1 = "Hello World";
  Immutable a(d1, s1);
  // Create a copy of the immutable object, share the data
  Immutable b(a);
  // Create a new immutable object
  // by changing an existing immutable object
  // (Note the new object is returned)
  string s2 = "Hello Other";
  Immutable c = a.SetS(s2);
  // Display the contents of each object
  cout << "a.GetD() = " << a.GetD() << ", "
    << "a.GetS() = " << a.GetS()
    << " [address = " << &(a.GetS()) << "]" << endl;
  cout << "b.GetD() = " << b.GetD() << ", "
    << "b.GetS() = " << b.GetS()
    << " [address = " << &(b.GetS()) << "]" << endl;
  cout << "c.GetD() = " << c.GetD() << ", "
    << "c.GetS() = " << c.GetS()
    << " [address = " << &(c.GetS()) << "]" << endl;
}
// Outputs:
// a.GetD() = 1.1, a.GetS() = Hello World [address = 008354BC]
// b.GetD() = 1.1, b.GetS() = Hello World [address = 008354BC]
// c.GetD() = 1.1, c.GetS() = Hello Other [address = 008355B4]

Note that object b shares the same string as object a (both strings are at the same address). Adding additional fields with associated getters and setters is trivial. Though this code is good, it’s a little more difficult to scale to containers when you’re being efficient. For example, a naïve ImmutableVector might maintain a list of shared pointers representing each element of the array. When the naïve Immutable-Vector is changed, the entire array of shared pointers would need to be duplicated, incurring additional cost as each shared_ptr element of the array would need its reference count to be incremented.

There is a technique, though, that allows the data structure to share most of its data and minimize the duplication. This technique uses a tree of some form to require duplication of only the nodes that are directly affected by a change. Figure 12shows a comparison of a naïve ImmutableVector and a smart ImmutableVector.

Comparing Naïve and Smart ImmutableVectors
Figure 12 Comparing Naïve and Smart ImmutableVectors

This tree technique scales nicely: as the number of elements grows, the percent of the tree that needs to be duplicated is minimized. Moreover, by adjusting the branching factor (so each node has more than two children), you can achieve a balance in memory overhead and node reuse.

I developed a smart ImmutableVector template class that can be downloaded from archive.msdn.microsoft.com/mag201208CPP. Figure 9 shows how you can use my ImmutableVector class. (As previously noted, to make the immutable nature of the ImmutableVector clearer to the users, ImmutableVector uses static member functions for all actions that generate new versions.)

For read-only actions, the vector can be used much like a regular vector. (Note that for this example I haven’t implemented iterators, but doing so should be fairly trivial.) For write actions, the AppendValue and TruncateValue static methods return a new ImmutableVector, thus preserving the original object. Unfortu-nately, this isn’t reasonable for the array subscript operator, so I made the array subscript operator read-only (that is, it returns a const reference) and provided a SubstituteValueAtIndex static method. It would be nice, however, to be able to make a large number of modifications using the array subscript operator in a single block of code. To facilitate this, ImmutableVector provides a GenerateFrom static method, which takes a lambda (or any other functor). The lambda in turn takes a reference to MutableVector as a parameter, which allows the lambda to work on a temporary MutableVector that can be changed freely like a normal std::vector. The example in Figure 10 shows the various methods for operating on the ImmutableVector.

The beauty of the GenerateFrom static method is that the code within it can be written in an imperative way, while resulting in an immutable object that can be safely shared. Note that the Generate-From static method prevents unauthorized access to the underlying ImmutableVector by disabling the MutableVector it passed into the lambda as soon as the lambda exited. Please note as well that while ImmutableVector provides a strong thread-safety guarantee, its helper class MutableVector does not (and is intended to be only used locally within the lambda, not passed around to other threads). My implementation also optimizes for multiple changes within the Change method such that there’s minimal restructuring occurring on the temporary tree, which gives a nice performance boost.

Wrapping Up

This article gives you just a taste of how you can use functional-style programming in your C++ code. Moreover, C++ 11 features such as lambdas bring a touch of functional-style programming regardless of the paradigm used.


David Cravey is a Visual C++ MVP who enjoys programming in C++ maybe a bit too much. You’ll find him presenting at local C++ user groups and universities. During the day he enjoys working at NCR, through TEKsystems in Fort Worth, Texas.

Thanks to the following technical experts for reviewing this article: Giovanni Dicanio, Stephan T. Lavavej, Angel Hernández Matos, Alf P. Steinbach and David Wilkinson