This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

MSDN Magazine

Programming for 64-bit Windows
Matt Pietrek
Download the code for this article:Hood1100.exe (58KB)
I

'm really excited this month because the time has finally come to talk about programming for 64-bit Windows®. Since computers with Itanium processors won't be widely available in the year 2000, most of you won't be doing active development for 64-bit Windows when this column first appears. Nonetheless, when it comes to designing your next great software masterpiece, knowing what's coming down the pike is invaluable.
      In my own work, CPU portability is now a much more tangible concept, since I've written code that works on both 32-bit and 64-bit Windows. Having worked with Intel's IA64 architecture and its Explicitly Parallel Instruction Computing (EPIC) technology for more than six months, I can tell you that there's no way to cover all of the interesting new concepts in a single article, much less a column. Nonetheless, there are significant topics that I think developers should be aware of now.
      My subject this month is the global pointer and its support code. Don't worry if you're new to the IA64 architecture. What I'll describe doesn't require an intimate knowledge of the chip or its instructions. My goal is to describe the global pointer in just enough detail so you can plan for it if needed.
      To be completely honest, many of you who eventually develop for Windows 2000, 64-bit Edition won't directly encounter the global pointer in your coding. The compiler, linker, and OS successfully conspire to hide it from you. However, as soon as you begin debugging code at the assembly level, you'll come face to face with the global pointer and its support code. I was certainly confused the first time I stepped into a seemingly simple call to a DLL and found that it took half a dozen instructions.
      One note before I continue: for convenience sake I'll use the term "load module" to refer to any Portable Executable (PE) file, including EXEs and DLLs.

What is the Global Pointer?

      In the simplest sense, the global pointer is a preassigned value for accessing data within a load module. There is no global pointer equivalent on the x86 architecture because it's simply not needed. On the x86, the address space is 32 bits (4GB), and an instruction can contain a full 32-bit address as part of its encoding. For example, consider a DWORD global variable at address 0x78001234. To get the contents of that memory, the compiler might generate this code:

MOV ECX, [78001234h]
This instruction takes six bytes, two for the MOV ECX opcode, and the remaining four bytes contain the DWORD value 0x78001234.
      On the IA64, each instruction is 41 bits in length. As such, it's impossible for a 41-bit IA64 instruction to contain a complete 64-bit address. Instead, all memory accesses must be done by using a 64-bit pointer value loaded into a general-purpose register. When working with data defined within a load module, the global pointer makes it possible to put the target address that you want into a register using a single instruction. (There are some minor exceptions, which I'll cover later.)
      Under 64-bit Windows on the IA64 platform, each load module has its own global pointer value. This value points somewhere near the load module's static data area, which includes global variables, strings, COM vtables, and other assorted items. The global pointer value for the currently executing load module is stored in the r1 register, which is one of the IA64 general-purpose registers. In assembly code and in the debugger, the r1 register is usually referred to by another name: the GP register, which quite expectedly is shorthand for global pointer.
      Some of you Windows old-timers may be experiencing d�j� vu at this moment. The concept of a GP register and per-load module data areas is very similar to the dreaded data segments of 16-bit programming in Windows. In both cases, a specific register must be set up to point at the load module's data area in memory. All accesses to that memory are done as offsets relative to that register. Fortunately, the IA64 data area is 4MB, rather than the 64KB segment limit in Win16. In addition, using the global pointer is much more seamless than Win16 data segments.
      You might be wondering why a load module's data area is limited to 4MB. This constraint exists because the largest immediate value that can be added to a register in a single IA64 instruction is 22 bits. Do the math and you'll find that a 22-bit reference is 4MB.
      Let's look at a typical IA64 instruction that uses the GP register:
add r29 = 0x123456, gp
This instruction adds 0x123456 to the value in the GP register and stores the result in the r29 register. The r29 register now contains a full 64-bit pointer to some piece of data defined by the load module. Figure 1 shows a load module with the GP register pointing to its data area.

Figure 1 GP Register
Figure 1GP Register

      Almost all of a load module's data is addressed relative to the GP register. One nice side effect of this addressing scheme is that the code instructions don't need to contain hardcoded addresses like x86 code often does. By not having hardcoded addresses, the code is position-independent, which means that the code doesn't need to be modified to be correct for its load address in memory. Position-independent code eliminates the need for the base relocations that are used in the x86 version of Windows.
      Alert readers might be wondering how data that exceeds 4MB is handled. For example, you might have a very large, statically declared array. The details of how to handle this are left up to the compiler. However, the general scheme suggested by Intel is to store a pointer within the 4MB "short" data area. This pointer contains the address of the large memory block. The compiler is responsible for knowing when it needs to indirect through a pointer to get the address of the large memory block.
      It's worth noting that the IA64 did not introduce the concept of a global pointer. RISC processors such as the PowerPC used a global pointer long before the IA64 came about. You can see evidence of this in WINNT.H, where the #define IMAGE_DIRECTORY_ENTRY_GLOBALPTR has been around since the first release of the Win32 include files in 1992.

Setting Up the Global Pointer

      Since each IA64 load module has its own data area and associated GP value, some sort of mechanism is necessary to change the GP value when control goes from one load module to another. The convention is that the calling code sets the GP register to the correct value for the target before making the call. The called code assumes that the correct global pointer value has been loaded into the GP register. This is exactly the opposite of 16-bit Windows, where each exported function is responsible for setting up the correct data segment value in its prologue.
      Why make the calling code responsible for setting the GP register? Again, this relates to position-independent code. If the target routine needed to set its own GP, the code would need to somehow embed the GP value in its prologue instructions. If this were the case, instructions would need to be altered to reflect the address where the OS mapped the load module into memory. Modifying the code also makes it non-position-independent.
      Since the called code can't set its GP, the only other choice is for the caller to set it. The question then becomes, "How does the caller know the GP value for the target function?" The answer lies in a data structure defined by Intel, and adopted by Microsoft for its implementation of 64-bit Windows.

Introducing the PLABEL_DESCRIPTOR

      In WINNT.H, you'll find the following structure definition, which has been around at least since the introduction of Visual Studio® 6.0:

typedef struct _PLABEL_DESCRIPTOR {
   ULONGLONG EntryPoint;
   ULONGLONG GlobalPointer;
} PLABEL_DESCRIPTOR, *PPLABEL_DESCRIPTOR;
With some exceptions that I'll explain shortly, every function in a load module has an associated PLABEL_DESCRIPTOR in the load module. A PLABEL_DESCRIPTOR is just two 64-bit pointers, and is automatically generated by the compiler. The EntryPoint field is the address of the associated function. The second field, GlobalPointer, contains the GP value to be used by the associated function. In normal situations, all PLABEL_DESCRIPTORs in a given load module contain the same global pointer value. The OS initializes all the PLABEL_DESCRIPTOR values when the load module is loaded.
      Having spent quite some time figuring out how to manipulate PLABEL_DESCRIPTORs, I can tell you that the compiler and linker do a great job of hiding their presence. Any place where you'd expect a code address to appear, a PLABEL_DESCRIPTOR sits in its place. The compiler obviously knows about PLABEL_DESCRIPTORs, and when generating instructions for certain types of calls, loads the GP register from the GlobalPointer field before branching to the address in the EntryPoint field. Figure 2 shows a typical call using a PLABEL_DESCRIPTOR.

Figure 2 Using PLABEL_DESCRIPTOR
Figure 2Using PLABEL_DESCRIPTOR

      Let me give you some examples of places where a PLABEL_DESCRIPTOR is used in place of a code address. For starters, the PE header of a load module contains the Relative Virtual Address (RVA) for the load module's entry point. If you go to that RVA, you'll find a PLABEL_DESCRIPTOR. Likewise, the exports section of a load module specifies the RVAs of the exported functions. At each of these RVAs is a PLABEL_DESCRIPTOR for the exported function.
      The PLABEL_DESCRIPTOR pervasiveness extends to imported code addresses. The imports section of a load module contains import address tables (IATs) for each imported DLL. The IAT is really just an array of function pointers, one per imported function. In an IA64 load module, each IAT array element is the address of a PLABEL_DESCRIPTOR in the DLL being imported. The same goes for COM vtables, where each vtable entry points to a PLABEL_DESCRIPTOR, not the actual code address.
      What about GetProcAddress? As you might guess, the address returned is a PLABEL_DESCRIPTOR, not the actual code address. Function pointers? The same story. The compiler assumes that any function pointer really points to a PLABEL_DESCRIPTOR and generates the appropriate code.
      Perhaps you think you can outsmart the compiler. What happens if you try to get the address of a function using the & operator in C++? For example:

void foo(void)
{
}

void main()
{
    void * p = &foo;
}
The expression "&foo" resolves to the PLABEL_DESCRIPTOR for foo, rather than the code address of foo.
      If you truly want to get the code address of a function, you can cheat by casting a function's address to a PLABEL_DESCRIP-TOR *, and reading the EntryPoint field. The PLabel program shown in Figure 3 provides an example of this. It takes the address of function main, casts it to a PLABEL_DESCRIPTOR *, and prints out the two data fields.
      For good measure, the PLabel code also obtains and displays the global pointer of the EXE by reading the RVA out of the PE header. The two global pointer values should be identical. I've included the IA64 executable file (PLabel.EXE) in this month's download (see the link at the top of this article) so that you can run it (on a 64-bit machine) without installing the 64-bit compiler. Here's the output from running PLabel on my IA64 machine:
EntryPoint:    0000000000402060
GlobalPointer: 0000000000612000
GlobalPointer from PE header: 0000000000612000

Drilling into PLABEL_DESCRIPTORs

      You might wonder how the compiler makes the use of PLABEL_DESCRIPTORs so seamless. A little digging reveals that the Visual C++® compiler actually generates two symbols for each function. The normal symbol you'd expect (for instance, foo) resolves to the PLABEL_DESCRIPTOR's address. The function's code address is referred to by a second symbol. This second symbol is prefixed with a period (for instance, .foo).
      Generally speaking, there's always a PLABEL_DESCRIPTOR for each function. However, there are a couple of ways that a PLABEL_DESCRIPTOR for a function might not end up in the load module. To understand this, it's important to know that there are two types of call instructions on the IA64. One form is where control goes to an address relative to the calling instruction (this is known as an IP-relative call). The other call form is the indirect form, where the target address is placed in a branch register, and the call is made using that register. This second form is how calls made via a PLABEL_DESCRIPTOR are invoked.
      A compiler can skip emitting a PLABEL_DESCRIPTOR if a function isn't visible outside its source file (that is, a static function), and if the function is referenced only via normal (IP relative) calls. Taking the address of a function violates these rules and causes a PLABEL_DESCRIPTOR to be generated.
      Another way a PLABEL_DESCRIPTOR can avoid ending up in a load module is when only normal (IP-relative) calls are made to the function, and the function isn't exported. In this case, the compiler will generate a PLABEL_DESCRIPTOR, but the linker may decide to throw it away in the resulting load module, since it was never used.
      I encourage you to experiment by constructing scenarios like those I've described, and examining the .MAP file for the load module. When a PLABEL_DESCRIPTOR is in the load module, you'll see the function name twiceâ€"once with the period prefixed to the name, and a second time without the period. Figure 4 shows excerpts of the .MAP file for the PLabel program shown in Figure 3. Observe that in section 1 (the code section with symbol addresses like 0001:XXXXXXXX) the symbol names have the period prefix (for instance, .main and ._exit). In section 2 (the .rdata section) you'll see the matching PLABEL_DESCRIPTORs (for example, main and _exit).
      As a final note on PLABEL_DESCRIPTORs, the IA64 is not the first CPU to use them. The PowerPC also used them when Windows NT® supported it. The terminology was slightly different though, with TOC (Table Of Contents) meaning the same thing as the global pointer.

Wrap-up

      In general day-to-day programming, most architectural details of the IA64 don't manifest themselves when moving code from Win32® to compile and run on 64-bit Windows. However, it often becomes necessary to go under the hood to see what's really going on in the compiler-generated code. When you're in the debugger's assembly window trying to decipher what's happening, knowing about the global pointer and PLABEL_DESCRIPTORs is invaluable.
      If you write certain types of software tools or generate code on the fly, an even deeper understanding of these concepts is necessary. Do you do things like replace IAT entries? Now is the time to start thinking about how your code will operate when compiled for the IA64 platform. While there's certainly more to the story than I've written about here, I've tried to spare you the pain of figuring out these details on your own. I also hope you've seen how different IA64 programming can be from x86 programming once you veer outside the straight and narrow.

Matt Pietrek does advanced research for the NuMega Labs of Compuware Corporation, and is the author of several books. His Web site, at https://www.wheaty.net, has a FAQ page and information on previous columns and articles.

From the November 2000 issue of MSDN Magazine.