May 2015

Volume 30 Number 5


Windows with C++ - Adding Compile-Time Type Checking to Printf

By Kenny Kerr | May 2015

Kenny KerrI explored some techniques for making printf more convenient to use with modern C++ in my March 2015 column (msdn.magazine.com/magazine/dn913181). I showed how to transform arguments using a variadic template in order to bridge the gap between the official C++ string class and the antiquated printf function. Why bother? Well, printf is very fast, and a solution for formatted output that can take advantage of that while allowing developers to write safer, higher-level code is certainly desirable. The result was a Print variadic function template that could take a simple program like this:

int main()
{
  std::string const hello = "Hello";
  std::wstring const world = L"World";
  Print("%d %s %ls\n", 123, hello, world);
}

And effectively expand the inner printf function as follows:

printf("%d %s %ls\n", Argument(123), Argument(hello), Argument(world));

The Argument function template would then also compile away and leave the necessary accessor functions:

printf("%d %s %ls\n", 123, hello.c_str(), world.c_str());

While this is a convenient bridge between modern C++ and the traditional printf function, it does nothing to resolve the difficulties of writing correct code using printf. printf is still printf and I’m relying on the not entirely omniscient compiler and library to sniff out any inconsistencies between conversion specifiers and the actual arguments provided by the caller.

Surely modern C++ can do better. Many developers have attempted to do better. The trouble is that different developers have different requirements or priorities. Some are happy to take a small performance hit and simply rely on <iostream> for type checking and extensibility. Others have devised elaborate libraries that provide interesting features that require additional structures and allocations to keep track of formatting state. Personally, I’m not satisfied with solutions that introduce performance and runtime overhead for what should be a fundamental and fast operation. If it’s small and fast in C, it should be small, fast and easy to use correctly in C++. “Slower” should not enter into that equation.

So what can be done? Well, a little abstraction is in order, just so long as the abstractions can be compiled away to leave something very close to a handwritten printf statement. The key is to realize that conversion specifiers such as %d and %s are really just placeholders for the arguments or values that follow. The trouble is that these placeholders make assumptions about the types of the corresponding arguments without having any way of knowing whether those types are correct. Rather than trying to add runtime type checking that will confirm these assumptions, let’s throw away this pseudo-type information and instead let the compiler deduce the types of arguments as they’re naturally resolved. So rather than writing:

printf("%s %d\n", "Hello", 2015);

I should instead restrict the format string to the actual characters to output and any placeholders to expand. I might even use the same metacharacter as a placeholder:

Write("% %\n", "Hello", 2015);

There’s really no less type information in this version than in the preceding printf example. The only reason printf needed to embed that extra type information was because the C programming language lacked variadic templates. The latter is also much cleaner. I would certainly be happy if I didn’t have to keep looking up various printf format specifiers to make sure I’ve got it just right.

I also don’t want to limit output to just the console. What about formatting into a string? What about some other target? One of the challenges of working with printf is that while it supports output to various targets, it does so through distinct functions that aren’t overloads and are difficult to use generically. To be fair, the C programing language doesn’t support overloads or generic programming. Still, let’s not repeat history. I’d like to be able to print to the console just as easily and generically as I can print to a string or a file. What I have in mind is illustrated in Figure 1.

Figure 1 Generic Output with Type Deduction

Write(printf, "Hello %\n", 2015);
FILE * file = // Open file
Write(file, "Hello %\n", 2015);
std::string text;
Write(text, "Hello %\n", 2015);

From a design or implementation perspective, there should be one Write function template that acts as a driver and any number of targets or target adapters that are bound generically based on the target identified by the caller. A developer should then be able to easily add additional targets as needed. So how would this work? One option is to make the target a template parameter. Something like this:

template <typename Target, typename ... Args>
void Write(Target & target,
  char const * const format, Args const & ... args)
{
  target(format, args ...);
}
int main()
{
  Write(printf, "Hello %d\n", 2015);
}

This works to a point. I can write other targets that conform to the signature expected of printf and it ought to work well enough. I could write a function object that conforms and appends output to a string:

struct StringAdapter
{
  std::string & Target;
  template <typename ... Args>
  void operator()(char const * const format, Args const & ... args)
  {
    // Append text
  }
};

I can then use it with the same Write function template:

std::string target;
Write(StringAdapter{ target }, "Hello %d", 2015);
assert(target == "Hello 2015");

At first this might seem quite elegant and even flexible. I can write all kinds of adapter functions or function objects. But in practice it quickly becomes rather tedious. It would be much more desirable to simply pass the string as the target directly and have the Write function template take care of adapting it based on its type. So, let’s have the Write function template match the requested target with a pair of overloaded functions for appending or formatting output. A little compile-time indirection goes a long way. Realizing that much of the output that may need to be written will simply be text without any placeholders, I’ll add not one, but a pair of overloads. The first would simply append text. Here’s an Append function for strings:

void Append(std::string & target,
  char const * const value, size_t const size)
{
  target.append(value, size);
}

I can then provide an overload of Append for printf:

template <typename P>
void Append(P target, char const * const value, size_t const size)
{
  target("%.*s", size, value);
}

I could have avoided the template by using a function pointer matching the printf signature, but this is a bit more flexible because other functions could conceivably be bound to this same implementation and the compiler isn’t hindered in any way by the pointer indirection. I can also provide an overload for file or stream output:

void Append(FILE * target,
  char const * const value, size_t const size)
{
  fprintf(target, "%.*s", size, value);
}

Of course, formatted output is still essential. Here’s an AppendFormat function for strings:

template <typename ... Args>
void AppendFormat(std::string & target,
  char const * const format, Args ... args)
{
  int const back = target.size();
  int const size = snprintf(nullptr, 0, format, args ...);
  target.resize(back + size);
  snprintf(&target[back], size + 1, format, args ...);
}

It first determines how much additional space is required before resizing the target and formatting the text directly into the string. It’s tempting to try and avoid calling snprintf twice by checking whether there’s enough room in the buffer. The reason I tend to always call snprintf twice is because informal testing has indicated that calling it twice is usually cheaper than resizing to capacity. Even though an allocation isn’t required, those extra characters are zeroed, which tends to be more expensive. This is, however, very subjective, dependent on data patterns and how frequently the target string is reused. And here’s one for printf:

template <typename P, typename ... Args>
void AppendFormat(P target, char const * const format, Args ... args)
{
  target(format, args ...);
}

An overload for file output is just as straightforward:

template <typename ... Args>
void AppendFormat(FILE * target,
  char const * const format, Args ... args)
{
  fprintf(target, format, args ...);
}

I now have one of the building blocks in place for the Write driver function. The other necessary building block is a generic way of handling argument formatting. While the technique I illustrated in my March 2015 column was simple and elegant, it lacked the ability to deal with any values that didn’t map directly to types supported by printf. It also couldn’t deal with argument expansion or complex argument values such as user-defined types. Once again, a set of overloaded functions can solve the problem quite elegantly. Let’s assume the Write driver function will pass each argument to a WriteArgument function. Here’s one for strings:

template <typename Target>
void WriteArgument(Target & target, std::string const & value)
{
  Append(target, value.c_str(), value.size());
}

The various WriteArgument functions will always accept two arguments. The first represents the generic target while the second is the specific argument to write. Here, I’m relying on the existence of an Append function to match the target and take care of appending the value to the end of the target. The WriteArgument function doesn’t need to know what that target actually is. I could conceivably avoid the target adapter functions, but that would result in a quadratic growth in WriteArgument overloads. Here’s another WriteArgument function for integer arguments:

template <typename Target>
void WriteArgument(Target & target, int const value)
{
  AppendFormat(target, "%d", value);
}

In this case, the WriteArgument function expects an AppendFormat function to match the target. As with the Append and AppendFormat overloads, writing additional WriteArgument functions is straightforward. The beauty of this approach is that the argument adapters don’t need to return some value up the stack to the printf function, as they did in the March 2015 version. Instead, WriteArgument overloads actually scope the output such that the target is written immediately. This means complex types can be used as arguments and temporary storage can even be relied upon to format their text representation. Here’s a WriteArgument overload for GUIDs:

template <typename Target>
void WriteArgument(Target & target, GUID const & value)
{
  wchar_t buffer[39];
  StringFromGUID2(value, buffer, _countof(buffer));
  AppendFormat(target, "%.*ls", 36, buffer + 1);
}

I could even replace the Windows StringFromGUID2 function and format it directly, perhaps to improve performance or add portability, but this clearly shows the power and flexibility of this approach. User-defined types can easily be supported with the addition of a WriteArgument overload. I’ve called them overloads here, but strictly speaking they need not be. The output library can certainly provide a set of overloads for common targets and arguments, but the Write driver function shouldn’t assume the adapter functions are overloads and, instead, should treat them like the nonmember begin and end functions defined by the Standard C++ library. The nonmember begin and end functions are extensible and adaptable to all kinds of standard and non-standard containers precisely because they don’t need to reside in the std namespace, but should instead be local to the namespace of the type being matched. In the same way, these target and argument adapter functions should be able to reside in other namespaces to support the developer’s targets and user-defined arguments. So what does the Write driver function look like? For starters, there’s only one Write function:

template <typename Target, unsigned Count, typename ... Args>
void Write(Target & target,
  char const (&format)[Count], Args const & ... args)
{
  assert(Internal::CountPlaceholders(format) == sizeof ... (args));
  Internal::Write(target, format, Count - 1, args ...);
}

The first thing it needs to do is determine whether the number of placeholders in the format string equals the number of arguments in the variadic parameter pack. Here, I’m using a runtime assert, but this really ought to be a static_assert that checks the format string at compile time. Unfortunately, Visual C++ is not quite there yet. Still, I can write the code so that when the compiler catches up, the code can be easily updated to check the format string at compile time. As such, the internal CountPlaceholders function should be a constexpr:

constexpr unsigned CountPlaceholders(char const * const format)
{
  return (*format == '%') +
    (*format == '\0' ? 0 : CountPlaceholders(format + 1));
}

When Visual C++ achieves full conformance with C++14, at least with regard to constexpr, you should be able to simply replace the assert inside the Write function with static_assert. Then it’s on to the internal overloaded Write function to roll out the argument-specific output at compile time. Here, I can rely on the compiler to generate and call the necessary overloads of the internal Write function to satisfy the expanded variadic parameter pack:

template <typename Target, typename First, typename ... Rest>
void Write(Target & target, char const * const value,
  size_t const size, First const & first, Rest const & ... rest)
{
  // Magic goes here
}

I’ll focus on that magic in a moment. Ultimately, the compiler will run out of arguments and a non-variadic overload will be required to complete the operation:

template <typename Target>
void Write(Target & target, char const * const value, size_t const size)
{
  Append(target, value, size);
}

Both internal Write functions accept a value, along with the value’s size. The variadic Write function template must further assume there is at least one placeholder in the value. The non-variadic Write function need make no such assumption and can simply use the generic Append function to write any trailing portion of the format string. Before the variadic Write function can write its arguments, it must first write any leading characters and, of course, find the first placeholder or metacharacter:

size_t placeholder = 0;
while (value[placeholder] != '%')
{
  ++placeholder;
}

Only then can it write the leading characters:

assert(value[placeholder] == '%');
Append(target, value, placeholder);

The first argument can then be written and the process repeats until no further arguments and placeholders are left:

WriteArgument(target, first);
Write(target, value + placeholder + 1, size - placeholder - 1, rest ...);

I can now support the generic output in Figure 1. I can even convert a GUID to a string quite simply:

std::string text;
Write(text, "{%}", __uuidof(IUnknown));
assert(text == "{00000000-0000-0000-C000-000000000046}");

What about something a little more interesting? How about visualizing a vector:

std::vector<int> const numbers{ 1, 2, 3, 4, 5, 6 };
std::string text;
Write(text, "{ % }", numbers);
assert(text == "{ 1, 2, 3, 4, 5, 6 }");

For that I simply need to write a WriteArgument function template that accepts a vector as an argument, as shown in Figure 2.

Figure 2 Visualizing a Vector

template <typename Target, typename Value>
void WriteArgument(Target & target, std::vector<Value> const & values)
{
  for (size_t i = 0; i != values.size(); ++i)
  {
    if (i != 0)
    {
      WriteArgument(target, ", ");
    }
    WriteArgument(target, values[i]);
  }
}

Notice how I didn’t force the type of the elements in the vector. That means I can now use the same implementation to visualize a vector of strings:

std::vector<std::string> const names{ "Jim", "Jane", "June" };
std::string text;
Write(text, "{ % }", names);
assert(text == "{ Jim, Jane, June }");

Of course, that begs the question: What if I want to further expand a placeholder? Sure, I can write a WriteArgument for a container, but it provides no flexibility in tweaking the output. Imagine I need to define a palette for an app’s color scheme and I have primary colors and secondary colors:

std::vector<std::string> const primary = { "Red", "Green", "Blue" };
std::vector<std::string> const secondary = { "Cyan", "Yellow" };

The Write function will gladly format this for me:

Write(printf,
  "<Colors>%%</Colors>",
  primary,
  secondary);

The output, however, is not quite what I want:

<Colors>Red, Green, BlueCyan, Yellow</Colors>

That’s obviously wrong. Instead, I’d like to mark up the colors such that I know which are primary and which are secondary. Perhaps something like this:

<Colors>
  <Primary>Red</Primary>
  <Primary>Green</Primary>
  <Primary>Blue</Primary>
  <Secondary>Cyan</Secondary>
  <Secondary>Yellow</Secondary>
</Colors>

Let’s add one more WriteArgument function that can provide this level of extensibility:

template <typename Target, typename Arg>
void WriteArgument(Target & target, Arg const & value)
{
  value(target);
}

Notice the operation seems to be flipped on its head. Rather than passing the value to the target, the target is being passed to the value. In this way, I can provide a bound function as an argu­ment instead of just a value. I can attach some user-defined behavior and not merely a user-defined value. Here’s a WriteColors function that does what I want:

void WriteColors(int (*target)(char const *, ...),
  std::vector<std::string> const & colors, std::string const & tag)
{
  for (std::string const & color : colors)
  {
    Write(target, "<%>%</%>", tag, color, tag);
  }
}

Notice this isn’t a function template and I’ve had to essentially hardcode it for a single target. This is a target-specific customization, but shows what’s possible even when you need to step out of the generic type deduction directly provided by the Write driver function. But how can it be incorporated into a larger write oper­ation? Well, you might be tempted for a moment to write this:

Write(printf,
  "<Colors>\n%%</Colors>\n",
  WriteColors(printf, primary, "Primary"),
  WriteColors(printf, secondary, "Secondary"));

Putting aside for a moment the fact that this won’t compile, it really wouldn’t give you the right sequence of events, anyway. If this were to work, the colors would be printed before the opening <Colors> tag. Clearly, they should be called as if they were arguments in the order that they appear. And that’s what the new WriteArgument function template allows. I just need to bind the WriteColors invocations such that they can be called at a later stage. To make that even simpler for someone using the Write driver function, I can offer up a handy bind wrapper:

template <typename F, typename ... Args>
auto Bind(F call, Args && ... args)
{
  return std::bind(call, std::placeholders::_1,
    std::forward<Args>(args) ...);
}

This Bind function template simply ensures that a placeholder is reserved for the eventual target to which it will be written. I can then rightly format my color palette as follows:

Write(printf,
  "<Colors>%%</Colors>\n",
  Bind(WriteColors, std::ref(primary), "Primary"),
  Bind(WriteColors, std::ref(secondary), "Secondary"));

And I get the expected tagged output. The ref helper functions aren’t strictly necessary, but avoid making a copy of the containers for the call wrappers.

Not convinced? The possibilities are endless. You can handle character string arguments efficiently for both wide and normal characters:

template <typename Target, unsigned Count>
void WriteArgument(Target & target, char const (&value)[Count])
{
  Append(target, value, Count - 1);
}
template <typename Target, unsigned Count>
void WriteArgument(Target & target, wchar_t const (&value)[Count])
{
  AppendFormat(target, "%.*ls", Count - 1, value);
}

In this way I can easily and safely write output using different character sets:

Write(printf, "% %", "Hello", L"World");

What if you don’t specifically or initially need to write output, but instead just need to calculate how much space would be required? No problem, I can simply create a new target that sums it up:

void Append(size_t & target, char const *, size_t const size)
{
  target += size;
}
template <typename ... Args>
void AppendFormat(size_t & target,
  char const * const format, Args ... args)
{
  target += snprintf(nullptr, 0, format, args ...);
}

I can now calculate the required size quite simply:

size_t count = 0;
Write(count, "Hello %", 2015);
assert(count == strlen("Hello 2015"));

I think it’s safe to say that this solution finally addresses the type frailty inherent in using printf directly while preserving most of the performance benefits. Modern C++ is more than able to meet the needs of developers looking for a productive environment with reliable type checking while maintaining the performance for which C and C++ is traditionally known.


Kenny Kerr is a computer programmer based in Canada, as well as an author for Pluralsight and a Microsoft MVP. He blogs at kennykerr.ca and you can follow him on Twitter at twitter.com/kennykerr.

Thanks to the following Microsoft technical expert for reviewing this article: James McNellis