C++ At Work

Rationales, Highlights, and a Farewell

Paul DiLascia

Contents

The Rationale
Ditching Managed Extensions
Future Fun
The Evolution of Programming
The Ultimate Goal
Farewell, Though Not Goodbye

This month I'm going to deviate from my regular Q&A format to tell you about a really great document I discovered online. A few weeks ago, someone wrote me asking why he couldn't declare a const function in C++/CLI:

// reference class
ref class A {
   void f() const; // NOT!
};

To which I replied: you just can't, them's the rules. The Common Language Infrastructure (CLI) is designed to support many languages-like Visual Basic®, Java, and even COBOL-that don't even know what const means. The CLI doesn't know what const member functions are, so you can't use them.

After I fired off my reply, I vaguely remembered something buried way down in the depths of my memory blanks, something about const, something to do with compiler hints that other languages are free to ignore. I searched my old columns and discovered that I answered a question about const back in July 2004. In fact, C++/CLI does let you declare const data members and parameters-but not const member functions. Figure 1 shows a little program with a reference class that has a const static data member. If you compile this program, then use ILDASM to disassemble it, you'll see a line that looks like this:

field public static int32 
    modopt([mscorlib]System.Runtime.CompilerServices.IsConst)
        g_private = int32(0x00000001)

Figure 1 const.cpp

////////////////////////////////////////////////////////////////
// To compile type:
//    cl /clr const.cpp
//
#include <stdio.h>

ref class A {
   int m_val;
   // const data member allowed, will generate modopt
   static const int g_private = 1;
public:
   // public const member could be modified by Visual Basic or other
   // programs that don't know const -- so use literal instead.
   literal int g_public = 1;
   A(int i) { m_val = i; }
   void print();  // const;             // NO!--const fn not allowed
};

void A::print()
{
   printf("Val is %d\n",m_val);
}

int main()
{
   A a(17);
   a.print();
}

Modopt (optional modifier) is an MSIL declarator that says to CLI consumers: if you understand what this thing is, great; if not, you can safely ignore it. Conversely, modreq (required modifier) says: if you don't understand this thing, you can't use this function. Volatile is an example of a modreq. Since a volatile reference can be changed at any time by the OS/hardware (or even another thread), a CLI consumer better know what volatile means if it wants to use a volatile object. But const is optional. Note, however, that whereas the Managed Extensions would convert a C++ const object into a CLI literal, C++/CLI no longer does this. So watch out-if you declare a public const data member, clients written in a language like Visual Basic could change its value. If you don't want CLI clients to be able to change the value, you should declare your object literal, as in Figure 1. But what about member functions-why are const member functions not allowed?

The Rationale

Well, in the course of researching the answer, I stumbled upon a really nice paper called "A Design Rationale for C++/CLI" by Herb Sutter from Microsoft. Herb was one of the architects of C++/CLI. You can get the document, which I refer to hereafter as "the Rationale," from Herb's blog (the URL is a long ugly string-just search the Web for "C++/CLI Rationale" and you'll find it). As the title suggests, the Rationale explains why most everything in C++/CLI is the way it is. It answers everything from "Why extend C++ in the first place?" to my own query about const functions. I consider the Rationale required reading for anyone who wants to understand the reasoning behind C++/CLI. Since so much of the Rationale bears repeating, I thought I'd give you some highlights.

Let's start with the most important question, why extend C++ in the first place? The simple and obvious answer is: to make C++ a first-class CLI citizen. The Microsoft® .NET Framework is the future of Windows® development. Heck, even COBOL supports the CLI. So C++/CLI is designed to ensure C++'s continued success.

But why muck with C++? Why extend the language with yucky new concepts like ^ and % and new keywords like ref, value, and property? Well, there's no other way. The Rationale quotes none other than Bjarne Stroustrup on this: "Classes can represent almost all the concepts we need. Only if the library route is genuinely infeasible should the language extension route be followed."

For plain C++ code, targeting the CLI is like writing a compiler for a different processor: no problem. But CLI introduces new concepts that require special code generation and simply can't be expressed in C++. For example, properties require special metadata. You can't implement properties with a library or template. The Rationale discusses some of the alternative syntaxes considered for properties (and other CLI features) and why they were rejected.

Ditching Managed Extensions

The Rationale also explains why Microsoft decided to ditch the Managed Extensions. The Managed Extensions' use of * for both managed and native objects was a clever and valiant attempt to unify C++ and CLI without changing the language, but it obscures the important differences between reference and native objects. The two kinds of pointers look the same, but behave differently. For example, destruction semantics, copy constructors, and virtual calls within constructors/destructors all behave differently depending on what kind of object the pointer really points to. Bad! As the Rationale says, "It is important not only to hide unnecessary differences, but also to expose essential differences," and again cites Bjarne: "I try to make significant operations highly visible." Native and managed classes are fundamentally different beasts, so better to highlight the difference than gloss over it. Various mechanisms were considered and rejected, and the C++/CLI team finally settled on gcnew, ^ (handle), and % (tracking reference). Separating managed and native classes like this has unanticipated side-benefits. For example, using a separate gcnew operator to allocate managed objects leaves open the possibility that one day native classes could be allocated from the managed heap-and vice versa!

Did you ever wonder why the Managed Extensions have all those ugly underbar keywords like __gc and __value? Because the Managed Extensions religiously follow the C++ standard, which says, "If thou really must introduce new keywords, thou shalt name them beginning with double underscore!" But guess what? When Microsoft introduced __gc, __value and the rest, the Redmondtonians received "unexpectedly strong" complaints from programmers. Yeah! Programmers of the world, unite! You have nothing to lose but your underbars. Underbars make your code look icky, like some kind of assembly language program or something. So C++/CLI has ref and value, sans underscores. This meant adding new keywords to C++, but so what? As Bjarne says, "My experience is that people are addicted to keywords for introducing concepts to the point where a concept that doesn't have its own keyword is surprisingly hard to teach. This effect is more important and deep-rooted than people's vocally expressed dislike for new keywords." (True, true. I love that Bjarne is describing the psychology of programming.) So C++/CLI ditched the underbars. By making them positional keywords instead of reserved keywords, they can't conflict with programs that may already use these words as variable or function names.

Curious readers might wonder: how come gc changed to ref in the process of losing its underbars? As the Rationale explains, the important thing about managed classes isn't that they live on the managed (garbage collected) heap. The important thing is their reference semantics. Handles (^) act like references, not pointers. See? Once you read the Rationale, everything makes sense.

If exposing essential differences is important, so is hiding superficial ones. For example, all the operator overloading "just works" the way any C++ programmer would expect, whether the object is native, ref, or value. For another example where C++/CLI papers over a difference, consider this snippet:

// ref class R as local variable
void f()
{ 
   R r; // ref class on stack???
   r.DoSomething();
   ...
}

Here r looks like a stack object, but even your Aunt Mopsie knows managed classes can't be physically allocated on the stack; they must be allocated from the managed heap. So? The compiler can make this snippet work as expected by generating the equivalent of this:

// how the compiler sees it.
void f()
{ 
   R^ r = gcnew R; // allocate on gc heap
   try
   {
      r->DoSomething();
      ...
   }
   finally
   {
      delete r;
   }
}

The object doesn't live on the physical stack, but who cares? The important thing is that the local variable syntax behaves the way any C++ programmer would expect. In particular, r's destructor is invoked before leaving f. And speaking of destructors, the same reasoning explains why C++/CLI restored deterministic destruction: so destruction follows the same semantics every C++ programmer has come to know and love. Yippee! Nondeterministic destruction was one of the worst nightmares of the Managed Extensions. C++/CLI maps the destructor to Dispose and introduces a special ! syntax for finalizers. The memory used by reference objects still isn't reclaimed until the garbage collector gets around to it, but that's no big deal. C++ programmers don't care so much that the memory be reclaimed the moment an object is destroyed; what really matters is that the destructor run when expected. C++ programmers frequently use the construct/destruct pattern to initialize/release non-memory resources like file handles, database locks, and so on. With C++/CLI, the familiar construct/destruct pattern works like you'd expect, for reference as well as native classes. The Redmondtonians deserve credit for owning up to the problems with the Managed Extensions-and for fixing them.

Future Fun

Among the more intriguing sections in the Rationale is one called "Future Unifications." It gives some tantalizing hints about where C++/CLI might go in the future. For example, you can't currently derive native classes from managed ones or vice versa. But you could achieve the same effect by implementing a workaround: include the "base" class as a data member, then write passthrough wrappers that do nothing but invoke the contained instance. Well, that sounds pretty formulaic, so why couldn't the compiler do it? The compiler could represent mixed objects as a two-part object with one CLI object containing all the CLI parts and one C++ object containing all the C++ parts.

And here Sutter reports an interesting anecdote: when he first showed this mixed-class idea to Bjarne Stroustrup, Bjarne went to the bookshelf "and opened a book to show where, despite criticism, he had always insisted that C++ must not require objects to be laid out contiguously in a single memory block." No one at the time saw any benefit to noncontiguous objects-but then no one could ever have anticipated .NET and the CLI. Bjarne's insistence on leaving the noncontiguous door open makes mixed objects possible. Don't be surprised if you see them in a future version of C++/CLI. And the moral is: when you're designing a new language or complex program that you expect will live forever and grow in unexpected ways, don't make unnecessary assumptions just because it makes your life easy!

The Rationale offers another interesting historical tidbit: the Redmondtonians' original internal name for C++/CLI was MC^2. As in M(anaged)C(++), with a tip of the hat (^) to Albert Einstein. But, as Sutter says, that was just "too cute." I agree. Do you really think anyone would embrace C++/CLI if it were called MC^2? In deciding to call it C++/CLI, the architects again followed Bjarne's footsteps. "I picked C++ because it was short, had nice interpretations, and wasn't of the form 'adjective C'," Bjarne says. C++/CLI suggests that C++ comes first, and also deliberately avoids the form "adjective C++."

The Rationale justifies other C++/CLI extensions like property, gcnew, generic, and const-and concludes with a useful FAQ section at the end. For details, download the Rationale and read it yourself. And speaking of const, what about my original question-Why does C++/CLI allow const data, but not const functions? The short answer is that CLI doesn't let you put a modopt/modreq directly on a function (though the CLI does actually have a way to encode this information in the metadata, it's just not tested). At least, not yet, as the Rationale is careful to say, implying that this feature may one day be added.

The Evolution of Programming

C++/CLI makes C++ a first-class CLI citizen. Moreover, if you read the Rationale, you'll realize it does so with minimum perturbation of C++. And C++ is still the best language for systems programming, because it provides more direct access to the CLI than any other language, and because you can still drop down to C if you want to call the old Win32® API-which will remain with us for a while longer, I'm sure.

To understand how important C++/CLI is and what it represents, you have to consider where we are in the evolution of programming. Let me give you a brief and idiosyncratic whirlwind review. In the old days, programmers used toggle switches to write their programs. Paper tape was an improvement over that, but each computer had its own "machine" language. As computers proliferated, programmers had to write their programs all over again for each new machine. Well, that was pretty tiresome, so programmers invented high-level languages like FORTRAN, BASIC, and C, that used something called a "compiler" to translate the high-level language into machine instructions for each machine. Figure 2 illustrates this. Now you could write your program once and compile it for different machines. Cool! C became the language of choice for systems programming because it was the "lowest-level high-level language," meaning it introduced the least baggage between itself and the machine. Most operating systems in use today are written in C, perhaps with a few performance-critical or hardware-interfacing sections coded in assembly language.

Many years later, C++ improved on C by making it object-oriented and, well, more fun. To quote Bjarne again: "C++ was designed so that the author and his friends would not have to program in assembler, C, or various modern high-level languages. Its main purpose is to make writing good programs easier and more pleasant for the individual programmer." C++ was great, but high-level languages didn't talk to each other very well. If you wrote some code in C++, you couldn't use it in BASIC or vice versa, at least not without great difficulty. Each language operated inside its own private world. That was fine for standalone apps, but as apps grew more complex, and the environment more distributed, the need to share code became more pressing. Ever since the first subroutine, programmers have sought the ultimate goal of totally encapsulated reusable components: little building blocks of software that programmers can assemble to create applications. And why should all the pieces have to be written in the same language?

Various solutions to the interoperable components problem evolved over the years. From the beginning, languages came with libraries (think C runtime and printf). In the Windows world, DLLs provided delay loading (the Dynamic in DLL) to conserve memory. DLLs also afforded interoperability because languages like Visual Basic and COBOL could call DLLs by introducing import statements that directed their compilers to emit the proper C-linkage calls into the DLL. But the linkage between apps and DLLs is too tight, too brittle, and too apt to break. Each app has to know the names and signatures of each entry in the DLL. Also, calling the other way (from DLL to app) is cumbersome, as you have to pass function pointers as callbacks. So programmers invented VBX, which became OCX, which became COM. Happily, COM is language-neutral: it has "type libraries" so languages don't have to know functions at link time; they can interrogate the typelib at run time. COM is pretty cool, but notoriously difficult to program.(Ironically, COM is most difficult to program in C++, the language in which it was conceived and implemented!) COM has other problems, too: it's too low-level and doesn't address things like security or memory management.

The Ultimate Goal

Now it's 2007 and we have the .NET Framework and its standardized subset, CLI. CLI solves the reusability problem in a completely different way, by inserting a new layer of abstraction between programming languages and the machine. Instead of generating machine instructions, compilers now generate MSIL code, which the CLI virtual machine/just-in-time (JIT) compiler then compiles to machine code on the fly. The virtual machine (VES, or Virtual Execution System, for those who love acronyms) provides a level of abstraction above the machine. Virtual machines aren't new. In fact, they've been around quite a while. Languages like Pascal and ZIL (the Zork Implementation Language, used internally to write games at Infocom, where I was once employed), work by compiling high-level programs into P-code (or Z-code), which is then interpreted by a virtual machine. But the CLI provides a common virtual machine (the C in CLI) all languages can use. The CLI supports fundamental concepts like classes, properties, inheritance, reflection, and so on. The VES/VM provides things like memory management and security, so programmers don't have to fret buffer overruns or other bugs that might open the door to malicious viruses. As the .NET Framework and the CLI have become more popular, high-level languages like Visual Basic, Fortran, COBOL, and C# have all morphed to more and more resemble each other, and to C++, because they must all support the underlying CLI concepts-classes, inheritance, member functions, and the rest. Each language still retains its distinct feel, so programmers don't have to retool themselves to use the .NET Framework; they only have to learn a few new concepts.

So now programmers can write classes in whatever language they choose and other programmers can consume those classes in whatever language they choose. Anyone can program any component in any language, and all the components work together seamlessly with very little programming effort. They all benefit from security, garbage collection, and other infrastructure features (the I in CLI). And when the Redmondtonians add new CLI features tomorrow, all languages will benefit from those, too. With Windows Vista™ and the .NET Framework 3.0 (which has something like 10,000 new classes associated with new technologies like Windows Presentation Foundation, Windows Communication Foundation, Windows Workflow Foundation, and Windows CardSpace™), Windows itself is being rewritten as CLI classes. It seems the goal of reusable, language-independent interoperable components has finally arrived. This represents a huge paradigm shift, and yet one which, amazingly, has proceeded in a relatively gradual, evolutionary manner. You should rejoice! Programming really is getting easier.

If you step back to contemplate all that, how could C++ not participate in this Brave New World? That's where C++/CLI comes in! Without C++/CLI, C++ would be in the lonely position of being the only modern programming language you can't use to program Windows! C++ might otherwise die a slow death or at least become severely marginalized. C++/CLI ensures that won't happen. It guarantees that programmers (like me) who love C++ can go on using it for generations to come. Figure 2 illustrates the paradigm shift to a virtual instead of physical machine and where "C++" fits in with and without "/CLI".

Figure 2 How Programming Evolved

Figure 2** How Programming Evolved **(Click the image for a larger view)

Farewell, Though Not Goodbye

And now a special message. This is the last time Yours Truly will pen (type?) his name to C++ At Work. My column is being retired. I like to think it's in honor of my longevity-you know, the way teams retire the numbers of great athletes? My departure does not reflect any lack of commitment to C++ on the part of Microsoft or MSDN®Magazine. Rather, it reflects the simple fact that after 164 columns (this is my 165th) spanning almost 14 years, Mr. DiLascia is getting a little tired. And here an interesting anecdote: Josh Trupin, the Executive Editor, recently called me the "Cal Ripken of MSDN Magazine." Well, I'm not a baseball fan (I like football better) but I know who Cal Ripken is. I was curious, though: how many games did Cal actually play? I searched the Web to find the answer: 2,632. I also found a bio that said something like, "most experts think Cal would've been a better player if he'd taken a break." Well, I took that as a message. But never fear, it doesn't mean I won't write for MSDN Magazine ever again.

Before I go, I want to thank all my loyal readers who've sent their questions, kept me honest by reporting errors and omissions, and inflated my ego by sending kind words of praise that I'll always treasure. Some readers even helped write my columns for me by testing solutions before I shipped them. I also want to thank all the great folks at MSDN Magazine I've worked with, past and present, in no particular order: Steve, Josh, Joanne, Eric, Etya, Terry, Laura, Joan (both of them), Nancy, Val, and Milo. Did I leave anyone out? These people have generously supported me over the years and given me wide latitude to write mostly whatever I wanted. I especially want to thank Gretchen Bilson. She left several years ago, but she's the one who hired me way back in 1993, when MSDN Magazine was still Microsoft Systems Journal. Thanks, Gretchen!

Well, that's all folks. B4N. That's for now, not forever. In the immortal Austrian monotone of Arnold Schwarzenegger: I'll be back. As always, happy programming!

Paul DiLascia is a freelance software consultant and Web/UI designer-at-large. He is the author of Windows++: Writing Reusable Windows Code in C++ (Addison-Wesley, 1992). In his spare time, Paul develops PixieLib, an MFC class library available from his Web site, www.dilascia.com.

Gak Pile: Say What?

Now that I've dispensed with formalities, I thought I'd end on a humorous note by sharing with you some of the most, er, unusual questions I've received over the years. I've been collecting these in a folder named Gak, as in: "Gak!" I've tried to group them in categories—for example, I always get a steady dose of language questions.

Where can I download C++?

Everybody speaks so highly of C++, why is it? What is the exact kind of stuff you can do in C++ and you can't on Visual Basic or Delphi? I've worked in both these, even in Visual C++®, and most everything I ever needed I've found in all of these languages. Is it just because most old senior programmers are used to it, it sounds fancy?

Gosh, I hope the "old senior programmers" doesn't mean me. Pointers are often a source of confusion, as these next questions attest:

I still could not understand pointers (var, object, functional) kindly explain me or send me some study materials.

I'm new to C++ and I was wondering what Pointers are and what they are used for. Many books I've seen described this in a very complex manner. I know that pointers point to a specific memory location but I don't really know why you need them. I always thought that the computer keeps track of each location automatically. Please reply soon!

(I forwarded this last question to my fellow writer pals Matt Pietrek and John Robbins for advice. Matt offered this suggestion: "Tell him he's right. Pointers are an old, obsolete concept. There's no need to learn about them now." John wrote, "I thought a Pointer was a kind of dog. You mean there's more?")

Moving along, I always get a compiler question or two.

Dir [sic] Paul, During compile time. How Compile link MFC library? Sometimes DLL compile. Because Hooking. So far, I'm not compile this source. Please. How to compile or send source. Have a Nice Day.

But more often I get Windows questions:

I came upon your keyboard/mouse detector program, it was an astounding experience for me. Yet, I have a question for you, as of now your program works if you have the "time counting panel" focused, is there a way to make it so that the program still does its job when it isn't on focus? I tried to figure this out but I am coming to a deadend so I email to ask, thank you for your time.

The thing is, I'm not sure what keyboard/mouse detector program he's referring to. I can't recall ever writing one. You'd think I'd remember, if it was astounding. Oh well.

Every now and then I get a math question:

I'm working on a program that uses a certain sequence of operations to "code" a letter, and later "decode" the letter. I'm having extreme difficulty in the decode part, where I have to do the reverse of the % operation. I know what the remainder is, and I know what you percent by, but I can't seem to find out what the starting number would be. For example, if the remainder of the operation was 22, and the operation would be (the number)%63, what would the number be? If you can help me, that would be great.

Hm. I can understand why this chap might have "extreme difficulty" decoding his letters, since there are an infinite number of solutions for n%a=b, where a and b are given and n is unknown. Alas, I suspect he didn't get an A in the class. Speaking of classes, I get a steady dose of questions that clearly fall into the "can you do my homework for me?" category.

I want a short program to: Enter 20 numbers and arranged [sic] them in assending [sic] order.

Could I get some help with these two programs? I need them tomorrow before noon. Please help me!

Write a program that will request a users beginning balance, request withdrawals debited against the account until the users enters a -1 or user's account balance goes below zero. Print the user's beginning balance, ending balance, total number of checks written, total amount of checks written and the check average.

Write a payroll program and print a users biweekly net pay. Ensure that the user provides their hours worked and pay rate. Also accommodations should be made for overtime pay. Assume that all pay a total of 18 percent of their gross pay towards deductions. Print the users pay, net pay and annual salary.

(There's a #3 too, but I'll spare you.) I often get questions that include long pages of source code, which I usually delete without reading. I sometimes get questions about topics that have nothing to do with my column—like Visual Basic, Microsoft Access™, or how to program Excel®. I even get questions from ordinary users, like "My machine crashed—what should I do?" or "How come to shut down my computer I have to press the Start button?"—which, when you think about it, is a very sensible question! Why indeed? (Raymond Chen answers this question.) Then there are the ones that defy categorization.

Dear Paul, Can you kindly guide to me to some statistical site which shows the average number of lines of Visual C++ code that an average developer can write in a day? Thanks for your help.

Dear Mr DiLascia, How can I be like you?

Dear Sir: I buy the MSDN to so much, look you in MSDN for MFC, feel so good, so ask you a question about "What can want to work for Microsoft." Very like the programming for Visual C++. I have dream then, is want to work for Microsoft, this is for skill, for value, for dream. Today, I can use MFC write the multimedia programming. In school, I can say I am goodhand. But, my English is so-so. I not girlfriend to five year for programming. Because programming is so fun. Maybe you will think, this good for nothing. I can understand. But hope you can reply letter to me. Tell me, What can want to work for Microsoft, and what can I do. Thank you very so much.

What I love is how this fellow's enthusiastic-geek personality shines through, despite "so-so" English (which, I assure you, is way better than my Mandarin!). I usually try to respond to most questions, even if all I can say is "Sorry, I don't know." Many readers are surprised when I actually write back.

Paul, I was not sure that a famous author like you would so readily and promptly help a stranger. You are not only a very knowledgeable professional, but also a very nice person. Thank you.

Well gosh—thank YOU, Mike! Finally, there's this gem:

May I know what is psychic window in windows programming. I don't know the spelling actually. My friend asked me this. Plz help me.

I'm afraid I don't know the answer to that one. Perhaps he should consult Uri Geller. If anyone out there knows what a psychic window is, please drop me a line—via cosmic telepathy.

Paul DiLascia is a freelance software consultant and Web/UI designer-at-large. He is the author of Windows++: Writing Reusable Windows Code in C++ (Addison-Wesley, 1992). In his spare time, Paul develops PixieLib, an MFC class library available from his Web site, www.dilascia.com.