From the June 2001 issue of MSDN Magazine.

MSDN Magazine

IA-64 Registers
Matt Pietrek
W

ith all the talk about Microsoft® .NET, it's easy to forget that another major new technology is coming to Windows® as well. I'm talking about 64-bit programming on Intel's IA-64 series of CPUs. In my next two columns, I'll look at the details of IA-64 parameter passing, stack frames, and return values. Don't worry if you haven't had the time to study the IA-64 architecture yet. You won't need to know all the gory details for what I'll be talking about, and I'll introduce just enough of the CPU basics to get you through.
      If you're curious, full documentation for the IA-64 is available on Intel's Web site (https://developer.intel.com/design/ia-64/manuals). However, it can be tough to wade through the documentation's very formal and precise definition of the CPU's behavior. For example, information that's only useful to a hardcore operating system writer is presented side by side with the most basic information that anyone should know. It's often difficult to tell what to pay attention to. What I'll describe here is a simplified version of the essential information I learned when I began porting the core of Compuware's BoundsChecker to 64-bit Windows.

A Review Of x86 Conventions

      Before I get to the IA-64 details, you might benefit from a quick review of parameter passing and stack frames on the x86 (Pentium) CPU. I've found it easier to understand the IA-64 if you can correlate it to something in the x86 architecture. If you know x86 programming cold, feel free to skip ahead to the next section.
      In the x86 world, parameters are traditionally passed on the stack. If you look at the assembly language for the call to a function that takes parameters, you'll usually see a PUSH instruction for each parameter that's passed. The PUSH is how the parameter gets onto the stack. Once all the parameters are on the stack, a CALL instruction transfers control to the target routine.
      In some cases, parameters are passed in registers rather than on the stack. The exact registers used depend upon the calling convention and the compiler. The fastcall convention is one example of a calling convention that uses registers. However, because of the relatively few general-purpose registers on the x86, at most two parameters are passed in registers for any given function. Also, there are other restrictions on using registers to pass parameters. For instance, floating-point parameters are almost never passed in a general-purpose register.
      Code accesses parameters and local variables using stack frames. The traditional x86 stack frame uses the EBP register to point at the frame. Such a frame is usually set up at the beginning of a function by the following code sequence:

  PUSH   EBP
  MOV    EBP,ESP
  SUB    ESP, XX

      Afterward, the stack frame looks like Figure 1. The key thing is that the various parts of the stack frame are accessible as offsets relative to the EBP register. At EBP+0 is the address of the previous stack frame. At EBP+4 is the function's return address (that is, the address of the instruction following the CALL to the current function.) As you'll see later, this is worth restating: on the x86 architecture, the CALL instruction places the return address in memory—specifically, on the stack.

Figure 1 Stack Frame
Figure 1 Stack Frame

      The function's parameters are at positive offsets from EBP. The first parameter is at EBP+8. Subsequent parameters will be at offsets greater than eight, and which are a multiple of four (for instance, 0xC, 0x10, 0x14, and so on). If the function has local variables, they will be at negative offsets from EBP; for instance, EBP-0x10.
      Of course, rules are meant to be broken. Nothing requires an x86-based function to have a stack frame like I've described. Optimized code often dispenses with the EBP frame, and accesses both the parameters and local variables as positive offsets from the ESP (stack pointer) register. Additionally, some local variables may be kept in registers rather than on the stack. Again, the small number of general-purpose registers limits how many locals can be stored in registers. Thus, it would be fair to say that on the x86 architecture, both parameters and locals tend to be stack-based, which is an important point when comparing it to the IA-64 architecture.
      Finally I'll review where function return values are kept. For your basic integer value that's four bytes or less, the convention is to use the EAX register. An 8-byte integer is typically returned in the EDX:EAX register pair, with EDX containing the high 32 bits. Floating-point return values are returned on the top of the floating-point register stack.

Important IA-64 Registers

      IA-64 CPUs have an unbelievable number of registers. However, for the purpose of understanding function calls, frames, and return values, you can ignore many of the registers and focus on just a few sets of them. To begin with, the IA-64 has 128 general-purpose registers, each 64 bits wide. These registers are conceptually similar to the general-purpose registers such as EAX on the x86. The IA-64 general-purpose registers are named with an r, followed by the register number. Thus, r0 is the first general-purpose register, and r127 is the last general-purpose register.
      The first 32 general-purpose registers (r0-r31) are static. That is, any code that refers to one of these registers will always be referring to the exact same register in silicon. All of the x86 registers act this way, and thus can be considered static.
      Some of the static registers have predefined meanings, and are usually referred to some other way than their r name. The global pointer and the stack pointer are two of the most important registers that fit this category. The r12 register is used as the stack pointer and is thus called the sp register. The r1 register is the global pointer, and you'll see it referred to as the "gp" register. I described the global pointer in my November 2000MSDN® Magazine column, so I won't go into it here.
      In addition to the 32 static general-purpose registers, the IA-64 also has 96 dynamic general-purpose registers, as shown in Figure 2. Dynamic means that a given register name doesn't always refer to the exact same physical register on the CPU. That is, a register such as r34 in one function is likely to be assigned to a completely different physical register than r34 in another function.

Figure 2 General-purpose Registers
Figure 2 General-purpose Registers

      If this is confusing, you shouldn't be surprised. It took me a while to wrap my head around dynamic registers and register renaming. However, as you'll see, register renaming is an essential element of achieving the performance goals of IA-64. If you want to be able to understand and read IA-64 assembly code (in the debugger for instance), you can't avoid this subject.
      There are some obvious questions like, "If I put a value in one of these registers and the name changes, how do I get at that register later?" It helps to understand that the mapping between a register name and a physical CPU register won't change within a function. Registers are only renamed when control goes from one function to another. (Actually, there is an IA-64 feature that uses register renaming within a function, but it's an advanced topic that I won't cover in this column.)
      What's the point of dynamic registers? Everything about the IA-64 is focused on speed. Keeping values in registers and out of memory is one way to keep things running quickly. If a parameter is passed on the stack, and if that stack location isn't in the memory cache, the CPU might waste dozens of clock cycles just to read the parameter from the slow main memory. In contrast, registers can always be accessed in a single clock cycle.
      The dynamic registers exist so that each function can have its own set of up to 96 registers to work with. Inside a function, registers r32 through r127 are all essentially reserved for just that function. Hopefully, all of the function's local variables and parameters can be stored in these 96 registers. Of course, a function may not need all 96 registers, but they are available nonetheless.
      You might be wondering what happens when a child function is called. How are all the dynamic registers of the parent function preserved? The strange truth is that there's no need to explicitly save the dynamic registers. Yes, strange but true. The magic that makes this possible is known as the Register Stack Engine, which is a feature of the IA-64 architecture. The Register Stack Engine can seem completely baffling unless you're a CPU freak, so I'll put off discussing it until next time. For now, the important thing to know is that thanks to the Register Stack Engine, each function can consider registers r32 through r127 to be reserved solely for itself alone.
      In addition to the 128 general-purpose registers, the IA-64 also has 128 floating-point registers. They're named f0 through f127 (somebody probably gets paid a lot of money to come up with these names). The floating-point registers are 82 bits in length, allowing them to hold up to a C++ long double. As with the integer registers, certain floating-point registers have predefined meanings. For instance, the f0 register is always set to 0.0, while the f1 register always holds the value 1.0.
      The last set of registers you need to know here are the branch registers. The IA-64 defines eight branch registers, named b0 through b7. These are 64-bit registers that contain the address of a code location that the CPU can transfer control to. On the IA-64, all control transfers take the form of a branch. The br.call instruction is equivalent to the x86 CALL; the br.ret instruction is like the x86 RET; and a simple br instruction is like an x86 JMP.
      Unlike the x86, the IA-64 can't just jump or call to an arbitrary address loaded in a general-purpose register or stored in memory. Instead, the address must be loaded into one of the branch registers before executing some form of branch instruction. The branch register to use is specified as an argument to the branch instruction. It is possible to branch without using a branch register, but those types of branches are always to addresses relative to the instruction pointer.
      That's it for this month. Next time, I'll get into the details of procedure frames and parameter passing conventions. Then I'll look at what happens when you return from a function. Finally, I'll finish this series by looking at how the Register Stack Engine works.

Send questions and comments for Matt to hood@microsoft.com.

Matt Pietrek is one of the founders of Mindreef LLC, an Internet company. Prior to this, he was the lead architect for Compuware/NuMega's BoundsChecker. His Web site, at https://www.wheaty.net, has a FAQ page and information on previous columns and articles.