C++ at Work: Copy Constructors, Assignment Operators, and More

Article
10/18/2019

C++ At Work

Copy Constructors, Assignment Operators, and More

Paul DiLascia

Code download available at: CAtWork0509.exe (276 KB)
Browse the Code Online

Q I have a simple C++ problem. I want my copy constructor and assignment operator to do the same thing. Can you tell me the best way to accomplish this?

Shadi Hani

A At first glance this seems like a simple question with a simple answer: just write a copy constructor that calls operator=.

CFoo::CFoo(const CFoo& obj) 
{
   *this = obj;
}

Or, alternatively, write a common copy method and call it from both your copy constructor and operator=, like so:

CFoo::CFoo(const CFoo& obj) 
{
   CopyObj(obj);
}
CFoo& CFoo::operator=(const CFoo& rhs) 
{
  CopyObj(rhs);
  return *this;
}

A At first glance this seems like a simple question with a simple answer: just write a copy constructor that calls operator=.

CFoo::CFoo(const CFoo& obj) 
{
   *this = obj;
}

Or, alternatively, write a common copy method and call it from both your copy constructor and operator=, like so:

CFoo::CFoo(const CFoo& obj) 
{
   CopyObj(obj);
}
CFoo& CFoo::operator=(const CFoo& rhs) 
{
  CopyObj(rhs);
  return *this;
}

This code works fine for many classes, but there's more here than meets the eye. In particular, what happens if your class contains instances of other classes as members? To find out, I wrote the test program in Figure 1. It has a main class, CMainClass, which contains an instance of another class, CMember. Both classes have a copy constructor and assignment operator, with the copy constructor for CMainClass calling operator= as in the first snippet. The code is sprinkled with printf statements to show which methods are called when. To exercise the constructors, cctest first creates an instance of CMainClass using the default ctor, then creates another instance using the copy constructor:

CMainClass obj1;
CMainClass obj2(obj1);

Figure 1 Copy Constructors and Assignment Operators

// cctest.cpp: Simple program to illustrate a problem calling 
// operator= from copy constructor. This can result in member 
// objects getting initialized twice. To compile: cl cctest.cpp
//
#include <stdio.h>

//#define GOOD

//////////////////
// Class used as member of main class.
//
class CMember {
public:
   CMember()
   {
      printf(" CMember: default ctor\n");
   }

   CMember(const CMember&)
   {
      printf(" CMember: copy-ctor\n");
   }

   CMember& operator=(const CMember&) {
      printf(" CMember: operator=\n");
      return *this;
   }
};

//////////////////
// Main class
//
class CMainClass {
protected:
   CMember m_obj;

public:
   CMainClass()
   {
      printf(" CMainClass: default ctor\n");
   }

#ifndef GOOD
   // Copy-ctor calls operator=. This is not generally recommended.
   //
   CMainClass(const CMainClass& rhs)
   {
      printf(" CMainClass: copy-ctor\n");
      *this = rhs;
   }
#else
   // Below is a better way to initialize m_obj.
   //
   CMainClass(const CMainClass& rhs) : m_obj(rhs.m_obj)
   {
      printf(" CMainClass: copy-ctor\n");
   }
#endif

   CMainClass& operator=(const CMainClass& rhs)
   {
      printf(" CMainClass: operator=\n");
      m_obj = rhs.m_obj;
      return *this;
   }
};

int main()
{
   printf("Creating instance with default ctor:\n");
   CMainClass obj1;

   printf("Creating instance with copy-ctor:\n");
   CMainClass obj2(obj1);

   printf("Performing assignment:\n");
   obj2 = obj1;
}

If you compile and run cctest, you'll see the following printf messages when cctest constructs obj2:

CMember: default ctor
CMainClass: copy-ctor
CMainClass: operator=
CMember: operator=

The member object m_obj got initialized twice! First by the default constructor, and again via assignment. Hey, what's going on?

In C++, assignment and copy construction are different because the copy constructor initializes uninitialized memory, whereas assignment starts with an existing initialized object. If your class contains instances of other classes as data members, the copy constructor must first construct these data members before it calls operator=. The result is that these members get initialized twice, as cctest shows. Got it? It's the same thing that happens with the default constructor when you initialize members using assignment instead of initializers. For example:

CFoo::CFoo() 
{
   m_obj = DEFAULT;
}

As opposed to:

CFoo::CFoo() : m_obj(DEFAULT)
{
}

Using assignment, m_obj is initialized twice; with the initializer syntax, only once. So, what's the solution to avoid extra initializations during copy construction? While it goes against your instinct to reuse code, this is one situation where it's best to implement your copy constructor and assignment operator separately, even if they do the same thing. Calling operator= from your copy constructor will certainly work, but it's not the most efficient implementation. My observation about initializers suggests a better way:

CFoo::CFoo(const CFoo& rhs) : m_obj(rhs.m_obj) {}

Now the main copy ctor calls the member object's copy ctor using an initializer, and m_obj is initialized just once by its copy ctor. In general, copy ctors should invoke the copy ctors of their members. Likewise for assignment. And, I may as well add, the same goes for base classes: your derived copy ctor and assignment operators should invoke the corresponding base class methods. Of course, there are always times when you may want to do something different because you know how your code works—but what I've described are the general rules, which are to be broken only when you have a compelling reason. If you have common tasks to perform after the basic objects have been initialized, you can put them in a common initialization method and call it from your constructors and operator=.

Q Can you tell me how to call a Visual C++® class from C#, and what syntax I need to use for this?

Sunil Peddi

Q I have an application that is written in both C# (the GUI) and in classic C++ (some business logic). Now I need to call from a DLL written in C++ a function (or a method) in a DLL written in Visual C++ .NET. This one calls another DLL written in C#. The Visual C++ .NET DLL acts like a proxy. Is this possible? I was able to use LoadLibrary to call a function present in the Visual C++ .NET DLL, and I can receive a return value, but when I try to pass some parameters to the function in the Visual C++ .NET DLL, I get the following error:

Run-Time Error Check Failure #0—The value of ESP was not properly saved
across a function call. This is usually a result of calling a function
declared with one calling convention with a function pointer declared
with a different calling convention.

How can I resolve this problem?

Run-Time Error Check Failure #0—The value of ESP was not properly saved
across a function call. This is usually a result of calling a function
declared with one calling convention with a function pointer declared
with a different calling convention.

How can I resolve this problem?

Giuseppe Dattilo

A I get a lot of questions about interoperability between the Microsoft® .NET Framework and native C++, so I don't mind revisiting this well-covered topic yet again. There are two directions you can go: calling the Framework from C++ or calling C++ from the Framework. I won't go into COM interop here as that's a separate issue best saved for another day.

Let's start with the easiest one first: calling the Framework from C++. The simplest and easiest way to call the Framework from your C++ program is to use the Managed Extensions. These Microsoft-specific C++ language extensions are designed to make calling the Framework as easy as including a couple of files and then using the classes as if they were written in C++. Here's a very simple C++ program that calls the Framework's Console class:

#using <mscorlib.dll>
#using <System.dll> // implied
using namespace System;
void main()
{
  Console::WriteLine("Hello, world");
}

To use the Managed Extensions, all you need to do is import <mscorlib.dll> and whatever .NET assemblies contain the classes you plan to use. Don't forget to compile with /clr:

cl /clr hello.cpp

Your C++ code can use managed classes more or less as if they were ordinary C++ classes. For example, you can create Framework objects with operator new, and access them using C++ pointer syntax, as shown in the following:

DateTime d = DateTime::Now;
String* s = String::Format("The date is {0}\n", d.ToString());
Console::WriteLine(s);
Console::WriteLine(s->Length);

Here, the String s is declared as pointer-to-String because String::Format returns a new String object.

The "Hello, world" and date/time programs seem childishly simple—and they are—but just remember that however complex your program is, however many .NET assemblies and classes you use, the basic idea is the same: use <mscorlib.dll> and whatever other assemblies you need, then create managed objects with new, and use pointer syntax to access them.

So much for calling the Framework from C++. What about going the other way, calling C++ from the Framework? Here the road forks into two options, depending on whether you want to call extern C functions or C++ class member functions. Again, I'll take the simpler case first: calling C functions from .NET. The easiest thing to do here is use P/Invoke. With P/Invoke, you declare the external functions as static methods of a class, using the DllImport attribute to specify that the function lives in an external DLL. In C# it looks like this:

public class Win32 {
   [DllImport("user32.dll", CharSet=CharSet.Auto)]
   public static extern int MessageBox(IntPtr hWnd, String text, 
      String caption, int type);
}

This tells the compiler that MessageBox is a function in user32.dll that takes an IntPtr (HWND), two Strings, and an int. You can then call it from your C# program like so:

Win32.MessageBox(0, "Hello World", "Platform Invoke Sample", 0);

Of course, you don't need P/Invoke for MessageBox since the .NET Framework already has a MessageBox class, but there are plenty of API functions that aren't supported directly by the Framework, and then you need P/Invoke. And, of course, you can use P/Invoke to call C functions in your own DLLs. I've used C# in the example, but P/Invoke works with any .NET-based language like Visual Basic® .NET or JScript®.NET. The names are the same, only the syntax is different.

Note that I used IntPtr to declare the HWND. I could have got away with int, but you should always use IntPtr for any kind of handle such as HWND, HANDLE, or HDC. IntPtr will default to 32 or 64 bits depending on the platform, so you never have to worry about the size of the handle.

DllImport has various modifiers you can use to specify details about the imported function. In this example, CharSet=CharSet.Auto tells the Framework to pass Strings as Unicode or Ansi, depending on the target operating system. Another little-known modifier you can use is CallingConvention. Recall that in C, there are different calling conventions, which are the rules that specify how the compiler should pass arguments and return values from one function to another across the stack. The default CallingConvention for DllImport is CallingConvention.Winapi. This is actually a pseudo-convention that uses the default convention for the target platform; for example, StdCall (in which the callee cleans the stack) on Windows® platforms and CDecl (in which the caller cleans the stack) on Windows CE .NET. CDecl is also used with varargs functions like printf.

The calling convention is where Giuseppe ran into trouble. C++ uses yet a third calling convention: thiscall. With this convention, the compiler uses the hardware register ECX to pass the "this" pointer to class member functions that don't have variable arguments. Without knowing the exact details of Giuseppe's program, it sounds from the error message that he's trying to call a C++ member function that expects thiscall from a C# program that's using StdCall—oops!

Aside from calling conventions, another interoperability issue when calling C++ methods from the Framework is linkage: C and C++ use different forms of linkage because C++ requires name-mangling to support function overloading. That's why you have to use extern "C" when you declare C functions in C++ programs: so the compiler won't mangle the name. In Windows, the entire windows.h file (now winuser.h) is enclosed in extern "C" brackets.

While there may be a way to call C++ member functions in a DLL directly using P/Invoke and DllImport with the exact mangled names and CallingConvention=ThisCall, it's not something to attempt if you're in your right mind. The proper way to call C++ classes from managed code—option number two—is to wrap your C++ classes in managed wrappers. Wrapping can be tedious if you have lots of classes, but it's really the only way to go. Say you have a C++ class CWidget and you want to wrap it so .NET clients can use it. The basic formula looks something like this:

public __gc class Widget
{
private:
  CWidget* m_pObj; // ptr to native object
public:
  Widget()  { m_pObj = new CWidget; }
  ~Widget() { delete m_pObj; }
  int Method(int n) { return m_pObj->Method(n); }
  // etc.
};

The pattern is the same for any class. You write a managed (__gc) class that holds a pointer to the native class, you write a constructor and destructor that allocate and destroy the instance, and you write wrapper methods that call the corresponding native C++ member functions. You don't have to wrap all the member functions, only the ones you want to expose to the managed world.

Figure 2 shows a simple but concrete example in full detail. CPerson is a class that holds the name of a person, with member functions GetName and SetName to change the name. Figure 3 shows the managed wrapper for CPerson. In the example, I converted Get/SetName to a property, so .NET-based programmers can use the property syntax. In C#, using it looks like this:

// C# client
MPerson.Person p = new MPerson.Person("Fred");
String name = p.Name;
p.Name = "Freddie";

Figure 3 Managed Person Class

// Managed Wrapper for native CPerson class. Compile with /clr. (Managed
// Extensions.)
//
#include <afxwin.h>
#include "person.h"

#using <mscorlib.dll>
#include <vcclr.h>
using namespace System;

// Should have Assembly Information here, but this is just a lame demo
// program.
//
//[assembly: AssemblyTitle("MPerson")]
//[assembly: AssemblyDescription("MPerson wrappper")]
//etc.

// Every assembly needs a namespace for managed code to use.
namespace MPerson {

//////////////////
// Managed wrapper for native CPerson class
//
public __gc class Person
{
private:
    CPerson* m_pPerson; // ptr to native object

    // Common construction allocates native object
    void Construct()
    {
        m_pPerson = new CPerson("");
    }

    // private because CPerson has no default ctor
    Person()
    {
        Construct();
    }

public:
    Person(String* n)
    {
        Construct();
        Name = n; // invoke property assignment
    }

    ~Person()
    {
        delete m_pPerson;
    }

    // Convert native Get/SetName functions to CLR properties.
    __property String* get_Name()
    {
        return m_pPerson->GetName();
    }

    __property void set_Name(String* n)
    {
        const WCHAR  __pin* str = PtrToStringChars(n);
        USES_CONVERSION;
        LPCTSTR lpsz = W2T(str);
        m_pPerson->SetName(lpsz);
    }
};

}

Figure 2 Native CPerson Class

person.h

#include <windows.h>
#include <string>
using namespace std;

#ifndef DLLCLASS
#define DLLCLASS __declspec(dllimport)
#endif

#ifdef _UNICODE
#define tstring wstring
#else
#define tstring string
#endif

//////////////////
// Native C++ class
//
class DLLCLASS CPerson {
protected:
    tstring m_name;
public:
    CPerson(LPCTSTR n);
    ~CPerson();
    LPCTSTR GetName()           { return m_name.c_str(); }
    void SetName(LPCTSTR n) { m_name = n; }
};

person.cpp

#define DLLCLASS __declspec(dllexport)
#include "person.h"

////////////////////////////////////////////////////////////////
// Implementation of Person class
//
CPerson::CPerson(LPCTSTR n) : m_name(n)
{
}

CPerson::~CPerson()
{
}

Using properties is purely a matter of style; I could equally well have exposed two methods, GetName and SetName, as in the native class. But properties feel more like .NET. The wrapper class is an assembly like any other, but one that links with the native DLL. This is one of the cool benefits of the Managed Extensions: You can link directly with native C/C++ code. If you download and compile the source for my CPerson example, you'll see that the makefile generates two separate DLLs: person.dll implements a normal native DLL and mperson.dll is the managed assembly that wraps it. There are also two test programs: testcpp.exe, a native C++ program that calls the native person.dll and testcs.exe, which is written in C# and calls the managed wrapper mperson.dll (which in turn calls the native person.dll).

Figure 4 Interop Highway

Figure 4** Interop Highway **

I've used a very simple example to highlight the fact that there are fundamentally only a few main highways across the border between the managed and native worlds (see Figure 4). If your C++ classes are at all complex, the biggest interop problem you'll encounter is converting parameters between native and managed types, a process called marshaling. The Managed Extensions do an admirable job of making this as painless as possible (for example, automatically converting primitive types and Strings), but there are times where you have to know something about what you're doing.

For example, you can't pass the address of a managed object or subobject to a native function without pinning it first. That's because managed objects live in the managed heap, which the garbage collector is free to rearrange. If the garbage collector moves an object, it can update all the managed references to that object—but it knows nothing of raw native pointers that live outside the managed world. That's what __pin is for; it tells the garbage collector: don't move this object. For strings, the Framework has a special function PtrToStringChars that returns a pinned pointer to the native characters. (Incidentally, for those curious-minded souls, PtrToStringChars is the only function as of this date defined in <vcclr.h>. Figure 5 shows the code.) I used PtrToStringChars in MPerson to set the Name (see Figure 3).

Figure 5 PtrToStringChars

// PtrToStringChars, from vcclr.h

// get an interior gc pointer to the first character contained in a
// System::String object
//
inline const System::Char * PtrToStringChars(const System::String *s) {
    const System::Byte *bp = reinterpret_cast<const System::Byte *>(s);
    if( bp != 0 ) {
        unsigned offset = System::Runtime::CompilerServices::
            RuntimeHelpers::OffsetToStringData;
        bp += offset;
    }
    return reinterpret_cast<const System::Char*>(bp);
}

Pinning isn't the only interop problem you'll encounter. Other problems arise if you have to deal with arrays, references, structs, and callbacks, or access a subobject within an object. This is where some of the more advanced techniques come in, such as StructLayout, boxing, __value types, and so on. You also need special code to handle exceptions (native or managed) and callbacks/delegates. But don't let these interop details obscure the big picture. First decide which way you're calling (from managed to native or the other way around), and if you're calling from managed to native, whether to use P/Invoke or a wrapper.

In Visual Studio® 2005 (which some of you may already have as beta bits), the Managed Extensions have been renamed and upgraded to something called C++/CLI. Think of the C++/CLI as Managed Extensions Version 2, or What the Managed Extensions Should Have Been. The changes are mostly a matter of syntax, though there are some important semantic changes, too. In general C++/CLI is designed to highlight rather than blur the distinction between managed and native objects. Using pointer syntax for managed objects was a clever and elegant idea, but in the end perhaps a little too clever because it obscures important differences between managed and native objects. C++/CLI introduces the key notion of handles for managed objects, so instead of using C pointer syntax for managed objects, the CLI uses ^ (hat):

// handle to managed string
String^ s = gcnew String;

As you no doubt noticed, there's also a gcnew operator to clarify when you're allocating objects on the managed heap as opposed to the native one. This has the added benefit that gcnew doesn't collide with C++ new, which can be overloaded or even redefined as a macro. C++/CLI has many other cool features designed to make interoperability as straightforward and intuitive as possible.

Send your questions and comments for Paul to cppqa@microsoft.com.

Paul DiLascia is a freelance software consultant and Web/UI designer-at-large. He is the author of Windows++: Writing Reusable Windows Code in C++ (Addison-Wesley, 1992). In his spare time, Paul develops PixieLib, an MFC class library available from his Web site, www.dilascia.com.

Additional resources