Understanding Member Functions in Visual C++ 2005

Article
02/03/2012

Kenny Kerr

April 2006

Applies to:
   Visual C++ 2005
   Visual Studio 2005
   Common Language Infrastructure (CLI)

Summary: Review new features related to member functions that are introduced by Visual C++ 2005. (10 printed pages)

Introduction
Classic C++ Member Functions
CLI Methods under the Hood
Virtual Functions and the Runtime
Abstract and Sealed Functions
Conclusion

Introduction

Microsoft Visual C++ 2005 introduces new features in the C++ language related to member functions. Although they were largely motivated by the design effort to support the Common Language Infrastructure (CLI) in the language, they can also be used when writing native code.

This article explores these new features, as well as the mechanics behind them, as they relate to the CLI metadata that the compiler produces and, ultimately, how the Common Language Runtime (CLR) implements the behavior defined by the CLI specification.

Classic C++ Member Functions

Before we dive into some of the new features, let's quickly recap the essentials of member functions in Standard C++. To keep this article from turning into a lengthy treatise, we'll limit the discussion to virtual as well as non-static non-virtual functions. To understand the semantics around member functions, it is helpful to keep some concepts and terminology in mind. Consider the following variable declarations.

Base val;
Base* ptr = &val;
Base& ref = val;

val is a Base object. ptr uses the pointer declarator operator (*) to declare that it can store the address of a Base object or an object of a derived class. The declaration for ptr also includes an initializer, setting the pointer's initial value to the address of the val object. ref uses the reference declarator operator (&) to declare that it stores the address of a Base object or an object of a derived class. Its initializer sets the reference to the val object.

val is a simple object allocated on the stack, and as such will always have the same type, namely Base. ptr and ref, on the other hand, may refer to Base objects or objects of derived classes. To understand polymorphism you need to appreciate that ptr and ref can have different types at compile time and runtime. The compile time type of a variable is referred to as its static type since it is the same at compile time as it is at runtime. A variable's static type is declared in its variable declaration. The type of a variable at runtime is referred to as its dynamic type since it can be different to its static type and may change during the lifetime of the variable.

With those concepts defined, let's return to member functions. Consider the following class declaration.

class Base
{
public:
    void NotVirtual();
    virtual void Virtual();
};

NotVirtual is a non-virtual function since it lacks the virtual keyword. This indicates that, given a pointer or reference to a Base object and a call to the NotVirtual member function, the function's behavior is determined at compile time and the dynamic type of a Base variable is irrelevant.

Virtual, on the other hand, is a virtual function thanks to the virtual keyword. The function's behavior is determined independently of the static type of a Base variable. The runtime takes into account the dynamic type of the variable to determine the appropriate behavior.

Now consider the following derived class.

class Derived : public Base
{
public:
    void NotVirtual();
    virtual void Virtual();
};

The behavior of the NotVirtual function is again determined by the static type of the variable it is called on, and is thus baked into the executable by the compiler.

The Derived class's Virtual member function is said to override the implementation of the Base object. Given a pointer or reference variable, the behavior of the Virtual function is determined by the dynamic type of the variable, and so the decision regarding which implementation to call is deferred until runtime. On the other hand, if a virtual function is called on an object without the indirection of a pointer or reference, its behavior is then determined by the object itself and can once again be baked into the executable by the compiler. This should explain why even virtual functions can be inlined by the compiler in some cases.

CLI Methods under the Hood

To fully appreciate the motivation behind the additions to the language, it is helpful to have a good understanding of how the CLI defines methods. Consider the following minimal Common Intermediate Language (CIL) type definition.

.class public Base
{
    .method public void .ctor()
    {
        .maxstack 1
    
        ldarg.0 // this instance
        call instance void [mscorlib]System.Object::.ctor()
        
        ret
    }

    .method public void NotVirtual()
    {
        .maxstack 0
        
        ret
    }
}

Base is a CLI reference type and has two methods: .ctor and NotVirtual. .ctor is an example of what the CLI specification calls a special name and represents the type's instance constructor. To be specific, the Base type definition contains two method definitions. A method definition consists of a method head followed by a method body. A method head includes information about the method, such as its calling convention, various method attributes, a method name, and a signature. A method body typically contains CIL instructions, as well as various directives used to describe aspects of the method's implementation.

The compiler takes the method head information and writes it to various CLI metadata tables. Strings such as the method and argument names are placed in a special location in the resulting portable executable (PE) file known as the string heap. The metadata table entries that represent named entities include offsets into the string heap. The method bodies are not stored in the CLI metadata; rather, the CLI method table stores the relative virtual address (RVA) of each method body in the appropriate row. An RVA is simply an offset in memory relative to where the module is loaded. The compiler determines the offset for each method when generating the PE file, and the CLR determines the physical address at runtime by considering the RVA as well as where the module is loaded in memory.

Table 1 illustrates the TypeDef table entry describing the Base type. Don't read too much into the actual indexes and offsets; they will almost always be different since they depend heavily on what types, strings, and attributes you declare and use in your assembly. For this discussion, assume the assembly contains only the Base type.

Table 1 TypeDef table entry and Base type

Fields	Base type
Flags	0x00000000
Name	0x00000036
Namespace	0x00000000
Extends	1
FieldList	1
MethodList	1

The Flags field stores a bitmask of type attributes. The Base type has no interesting type attributes defined, and the 0 bitmask merely indicates that it is not a public type. The Name field stores an offset into the string heap in which the string Base can be found. The Namespace field again stores an offset into the string heap, and in this case points to the first entry in the heap that is always the empty string. The Extends field stores an index into the TypeRef table referring to the Object type from the mscorlib assembly. The FieldList and MethodList fields store indexes into the Field and Method tables, respectively.

Both FieldList and MethodList represent a one-to-many relationship. Each index marks the first of a continuous set of rows in the respective tables, however the number of rows belonging to the type is not actually stored anywhere. You can determine rows belonging to a type by looking at the indexes in the subsequent type in the TypeDef table. If yours is the last type in the table then the sequence continues till the end of the referenced table. In this case, the FieldList field points to the first row in the Field table, but this table does not contain any rows thus indicating that the type does not define any fields. The MethodList field points to the first row in the Method table. The Method table contains two rows, both belonging to this type, as illustrated in table 2.

Table 2. Method table entries for the base type

Fields	.ctor method	NotVirtual method
RVA	0x00002050	0x00002064
ImplFlags	0x0000	0x0000
Flags	0x1806	0x0006
Name	0x00000018	0x0000003b
Signature	0x0000001	0x0000005
ParamList	1	1

As mentioned before, the RVA field is the relative virtual address at which the method structure can be found. The ImplFlags field is a bitmask containing information used by the JIT compiler. Both methods have a 0 bitmask, which simply indicates that the method bodies contain only CIL, or managed, code. The Flags field is a bitmask of method attributes. The .ctor method includes the public, specialname, and rtspecialname attributes, while the NotVirtual method only includes the public attribute. The public attribute indicates the accessibility of the method, as you would expect. The specialname attribute indicates that the method name should be treated as special by compilers and tools. The rtspecialname attribute indicates that the runtime provides special behavior for the method, based on the method name. The Name field is again an offset into the string heap in which the method name can be found. The Signature field stores an offset into the blob heap in which each method's signature is stored. The ParamList field stores an index into the Param table and uses the same technique for representing the one-to-many relationship as described earlier for the TypeDef table's FieldList and MethodList fields.

Now that we have a basic understanding of methods in the CLI, let's examine some of the new language features introduced by Visual C++ 2005. These features make writing code for the CLR a breeze, while providing all the power and flexibility you might require.

Virtual Functions and the Runtime

The virtual keyword declares a virtual function. Unfortunately, Standard C++ does not distinguish between declaring a new virtual function and overriding a virtual function in a derived class. Consider the following class hierarchy.

class Base
{
public:
    virtual void DoSomething();
    void DoSomethingElse();
};

class Derived : public Base
{
public:
    void DoSomething();
    void DoSomethingElse();
};

Nothing about the function declarations in Derived indicates whether they are virtual or not. The programmer needs to look at the base class to see whether any virtual functions are declared with a matching name and signature. It is pretty common to see programmers decorating the override in the derived class with the virtual keyword to hint at the fact that it is a virtual function.

class Derived : public Base
{
public:
    virtual void DoSomething();
    void DoSomethingElse();
};

The problem with this, of course, is that it's not clear, just looking at Derived, whether DoSomething declares a new virtual function or whether it overrides a virtual function in the base class. Before we take a look at some of the new keywords that can help to make your code a lot more explicit and clear, let's take a moment to examine how the CLR implements virtual functions.

To understand the effects of the virtual keyword, it is helpful to consider how the CLR represents the metadata in the TypeDef and Method tables at runtime. Whereas the metadata and instructions stored in a PE file make no assumptions about the physical layout of types, the CLR builds concrete data structures to represent types and method tables to allow efficient calls to methods, without requiring method bodies to be looked up based on name and signature information following JIT compilation.

Each object is represented by an opaque data structure that represents the object's type. Each instance of a given type stores the same type handle referring to this structure in its object header. This is not directly accessible to the programmer, but is exposed through the Type object returned by the GetType member function that all types inherit from Object.

This type structure includes a method table containing an entry for each of the type's methods. A method entry stores the address of a unique stub routine that initially calls the JIT compiler to compile the CIL instructions for the method. Once the JIT compiler has completed, it overwrites the machine code in the method's stub routine to jump to the native code that was just produced.

When the JIT compiler encounters CIL call and callvirt instructions in a method body, it finds the call target in the target type's method table and produces machine code to call the method's stub routine.

For the call instruction, the machine code is pretty straightforward. The object reference is loaded into the appropriate register, based on the calling convention and processor architecture the code happens to be running on. This is the physical representation of a method's this variable. The method stub is then called, taking the address stored in the type's method table. Since the address of the method stub is baked into the machine code, it should be clear why this corresponds to a non-virtual function call.

The callvirt instruction is not so simple. The object reference is once again loaded into the appropriate register. An offset is then taken from the address of the object's type structure. The resulting address points to the method stub of the target method. Therefore, the dynamic type of the object, determined by the object's type structure, determines the ultimate method body to be called.

At this point it should be clear that getting the right behavior from virtual functions at runtime depends heavily on the layout of method tables for derived types. In order for the appropriate override of a virtual function to be called, its offset in the method table must match that of the base type. A given method table consists of addresses to method stubs in the following order:

Virtual functions from the base type
Virtual functions from any interfaces
Virtual functions from the current type
Non-virtual functions

As derived types are instantiated for the first time and the runtime constructs the method tables, it places the sequence of virtual functions at the beginning of the new method table, preserving the order so that native method bodies that call virtual functions through method table offsets will function correctly.

Needless to say, the runtime cannot guarantee the placement of interface method pointers in the method table since a type can implement any number of interfaces. To solve this challenge, the type structure for a given type contains an additional table of interface offsets that the runtime uses to determine the offset into the method table in which a particular interface's method addresses reside. A complete discussion of interfaces is beyond the scope of this article.

To allow the programmer some control over the method tables produced by the CLR, Visual C++ 2005 provides the override and new keywords.

The override keyword is more of an aid at development time to allow a function declaration to explicitly declare that it is overriding a virtual function in a base class. If a virtual function cannot be found in a base class, the compiler will make it clear in the form of a compiler error. The override will be placed at the same offset in the derived type's method table as the virtual function holds in the base type. The override keyword thus implies that the member function is virtual and the compiler will faithfully add the virtual method attribute to the metadata produced for it.

The new keyword declares that a member function should not override any virtual function with the same name and signature in a base class. If a virtual function exists that would normally be overridden in a base class, a special method attribute is added to the new function's CLI Method table entry to indicate to the CLR that the method must instead get a unique entry in the type's method table at runtime.

Both new and override are examples of context-sensitive keywords. That is, if they appear in a special location in the source code where traditionally those names would have been illegal, they take on a special meaning. Here is an example of both keywords in action.

ref class Fruit
{
public:
    virtual void Peel();
    virtual void Slice();
    virtual void Eat();
};

ref class TropicalFruit : Fruit
{
public:
    void Peel() override;
    virtual void Slice() new;
    void Eat() new;
};

The Peel function in the derived class overrides the virtual function with the same name in the base class. The override keyword ensures that a matching virtual function exists in the base class and that it will hold the same offset in the type's method table.

The Slice function in the TropicalFruit class is also virtual, but does not override the Slice function in the base class and, as such, will obtain a new entry in the type's method table, with a unique offset compared to other virtual functions in the hierarchy.

The distinction between Slice and Eat is important. Both use the new keyword but only Slice sports the virtual keyword. Slice is a new virtual function while Eat is a new, non-virtual member function. So the new keyword says nothing about virtual functions. It merely ensures that the function does not override a virtual function in the base class.

The overriding discussed so far, using the override keyword, is known as explicit overriding, in contrast to the implicit overriding provided by Standard C++. Another form of overriding that is available in Visual C++ 2005 is named overriding. Consider the following example.

ref class Fruit
{
public:
    virtual void Eat();
};

ref class TropicalFruit : Fruit
{
public:
    virtual void Scoff() = Eat;
};

Here, the TropicalFruit class overrides the Eat virtual function with the Scoff virtual function. Calling the Eat virtual function on a TropicalFruit object through a Fruit reference will invoke the Scoff implementation. Phew. This may seem a bit obscure, but it can come in handy, especially in multiple inheritance scenarios in native class hierarchies. For managed types, it is a handy solution for implementing two methods with identical signatures but different names from two interfaces in a single implementation function. Named overriding is possible as long as the virtual function in the base class has a matching signature and is not sealed.

As far as the CLI is concerned, however, explicit overriding and named overriding are two very different things. Implicit and explicit overriding are based on name and signature matching. Named overriding, on the other hand, results in entries in the CLI MethodImpl metadata table, mapping method declarations to method bodies.

Abstract and Sealed Functions

Two other keywords are worth mentioning in the context of virtual functions and derived classes.

The abstract keyword is used to declare an abstract function, meaning its implementation must be defined by a derived class. This also has the effect of marking the containing type abstract, meaning that it cannot be instantiated directly. Using the abstract modifier on a function is equivalent to the pure-specifier in Standard C++. The sealed keyword is used to indicate that a virtual function cannot be overridden in a base class. There is no equivalent to the sealed keyword in Standard C++. Consider the following example.

ref class Fruit
{
public:
    virtual void Eat() abstract;
};

ref class TropicalFruit : Fruit
{
public:
    void Eat() sealed;
};

Fruit is an abstract class due to the presence of the Eat abstract function. The TropicalFruit class implements the Eat virtual function and ensures that any derived classes don't override its behavior by using the sealed modifier on the function.

The abstract keyword can also be applied to the class directly, avoiding the requirement of having an abstract function. Similarly, the sealed keyword can be applied to a class to prevent derivation.

Conclusion

The CLI specification standardizes the binary and runtime representation of types and methods. This allows us to peer into the executables created by Visual C++ to gain a deeper understanding of how the CLR works, as well as how the compiler interprets the code we write.

The new keywords introduced in Visual C++ 2005 also help to provide the necessary control over a type's method table, as well as to promote more explicit coding practices.

About the author

Kenny Kerr spends most of his time designing and building distributed applications for the Microsoft Windows platform. He also has a particular passion for C++ and security programming. Reach Kenny at https://weblogs.asp.net/kennykerr/ or visit his Web site: https://www.kennyandkarin.com/Kenny/.