Export (0) Print
Expand All

How to: Use Alloc and Free to Improve Memory Performance

This document shows how to use the concurrency::Alloc and concurrency::Free functions to improve memory performance. It compares the time that is required to reverse the elements of an array in parallel for three different types that each specify the new and delete operators.

The Alloc and Free functions are most useful when multiple threads frequently call both Alloc and Free. The runtime holds a separate memory cache for each thread; therefore, the runtime manages memory without the use of locks or memory barriers.

The following example shows three types that each specify the new and delete operators. The new_delete class uses the global new and delete operators, the malloc_free class uses the C Runtime malloc and free functions, and the Alloc_Free class uses the Concurrency Runtime Alloc and Free functions.

// A type that defines the new and delete operators. These operators  
// call the global new and delete operators, respectively. 
class new_delete
{
public:
   static void* operator new(size_t size)
   {
      return ::operator new(size);
   }

   static void operator delete(void *p)
   {
      return ::operator delete(p);
   }

   int _data;
};

// A type that defines the new and delete operators. These operators  
// call the C Runtime malloc and free functions, respectively. 
class malloc_free
{
public:
   static void* operator new(size_t size)
   {
      return malloc(size);
   }
   static void operator delete(void *p)
   {
      return free(p);
   }

   int _data;
};

// A type that defines the new and delete operators. These operators  
// call the Concurrency Runtime Alloc and Free functions, respectively. 
class Alloc_Free
{
public:
   static void* operator new(size_t size)
   {
      return Alloc(size);
   }
   static void operator delete(void *p)
   {
      return Free(p);
   }

   int _data;
};

The following example shows the swap and reverse_array functions. The swap function exchanges the contents of the array at the specified indices. It allocates memory from the heap for the temporary variable. The reverse_array function creates a large array and computes the time that is required to reverse that array several times in parallel.

// Exchanges the contents of a[index1] with a[index2]. 
template<class T>
void swap(T* a, int index1, int index2)
{
   // For illustration, allocate memory from the heap. 
   // This is useful when sizeof(T) is large.
   T* temp = new T;

   *temp = a[index1];
   a[index1] = a[index2];
   a[index2] = *temp;

   delete temp;
}

// Computes the time that it takes to reverse the elements of a  
// large array of the specified type. 
template <typename T>
__int64 reverse_array()
{
    const int size = 5000000;
    T* a = new T[size];   

    __int64 time = 0;
    const int repeat = 11;

    // Repeat the operation several times to amplify the time difference. 
    for (int i = 0; i < repeat; ++i)
    {
        time += time_call([&] {
            parallel_for(0, size/2, [&](int index) 
            {
                swap(a, index, size-index-1); 
            });
        });
    }

    delete[] a;
    return time;
}

The following example shows the wmain function, which computes the time that is required for the reverse_array function to act on the new_delete, malloc_free, and Alloc_Free types, each of which uses a different memory allocation scheme.

int wmain()
{  
   // Compute the time that it takes to reverse large arrays of  
   // different types. 

   // new_delete
   wcout << L"Took " << reverse_array<new_delete>() 
         << " ms with new/delete." << endl;

   // malloc_free
   wcout << L"Took " << reverse_array<malloc_free>() 
         << " ms with malloc/free." << endl;

   // Alloc_Free
   wcout << L"Took " << reverse_array<Alloc_Free>() 
         << " ms with Alloc/Free." << endl;
}

The complete example follows.

// allocators.cpp 
// compile with: /EHsc 
#include <windows.h>
#include <ppl.h>
#include <iostream>

using namespace concurrency;
using namespace std;

// Calls the provided work function and returns the number of milliseconds  
// that it takes to call that function. 
template <class Function>
__int64 time_call(Function&& f)
{
   __int64 begin = GetTickCount();
   f();
   return GetTickCount() - begin;
}

// A type that defines the new and delete operators. These operators  
// call the global new and delete operators, respectively. 
class new_delete
{
public:
   static void* operator new(size_t size)
   {
      return ::operator new(size);
   }

   static void operator delete(void *p)
   {
      return ::operator delete(p);
   }

   int _data;
};

// A type that defines the new and delete operators. These operators  
// call the C Runtime malloc and free functions, respectively. 
class malloc_free
{
public:
   static void* operator new(size_t size)
   {
      return malloc(size);
   }
   static void operator delete(void *p)
   {
      return free(p);
   }

   int _data;
};

// A type that defines the new and delete operators. These operators  
// call the Concurrency Runtime Alloc and Free functions, respectively. 
class Alloc_Free
{
public:
   static void* operator new(size_t size)
   {
      return Alloc(size);
   }
   static void operator delete(void *p)
   {
      return Free(p);
   }

   int _data;
};

// Exchanges the contents of a[index1] with a[index2]. 
template<class T>
void swap(T* a, int index1, int index2)
{
   // For illustration, allocate memory from the heap. 
   // This is useful when sizeof(T) is large.
   T* temp = new T;

   *temp = a[index1];
   a[index1] = a[index2];
   a[index2] = *temp;

   delete temp;
}

// Computes the time that it takes to reverse the elements of a  
// large array of the specified type. 
template <typename T>
__int64 reverse_array()
{
    const int size = 5000000;
    T* a = new T[size];   

    __int64 time = 0;
    const int repeat = 11;

    // Repeat the operation several times to amplify the time difference. 
    for (int i = 0; i < repeat; ++i)
    {
        time += time_call([&] {
            parallel_for(0, size/2, [&](int index) 
            {
                swap(a, index, size-index-1); 
            });
        });
    }

    delete[] a;
    return time;
}

int wmain()
{  
   // Compute the time that it takes to reverse large arrays of  
   // different types. 

   // new_delete
   wcout << L"Took " << reverse_array<new_delete>() 
         << " ms with new/delete." << endl;

   // malloc_free
   wcout << L"Took " << reverse_array<malloc_free>() 
         << " ms with malloc/free." << endl;

   // Alloc_Free
   wcout << L"Took " << reverse_array<Alloc_Free>() 
         << " ms with Alloc/Free." << endl;
}

This example produces the following sample output for a computer that has four processors.

Took 2031 ms with new/delete.
Took 1672 ms with malloc/free.
Took 656 ms with Alloc/Free.

In this example, the type that uses the Alloc and Free functions provides the best memory performance because the Alloc and Free functions are optimized for frequently allocating and freeing blocks of memory from multiple threads.

Copy the example code and paste it in a Visual Studio project, or paste it in a file that is named allocators.cpp and then run the following command in a Visual Studio Command Prompt window.

cl.exe /EHsc allocators.cpp

Show:
© 2014 Microsoft