Overview of x64 Calling Conventions
Two important modifications from x86 to x64 are the 64-bit addressing capability and a flat set of 16 64-bit registers for general use. Given the expanded register set, x64 just uses thecalling convention and a RISC-based exception-handling model. The __fastcall model uses registers for the first four arguments and the stack frame to pass the other parameters.
The following compiler option helps you optimize your application for x64:
The x64 Application Binary Interface (ABI) is a 4 register fast-call calling convention, with stack-backing for those registers. There is a strict one-to-one correspondence between arguments in a function, and the registers for those arguments. Any argument that doesn’t fit in 8 bytes, or is not 1, 2, 4, or 8 bytes, must be passed by reference. There is no attempt to spread a single argument across multiple registers. The x87 register stack is unused. It may be used, but must be considered volatile across function calls. All floating point operations are done using the 16 XMM registers. The arguments are passed in registers RCX, RDX, R8, and R9. If the arguments are float/double, they are passed in XMM0L, XMM1L, XMM2L, and XMM3L. 16 byte arguments are passed by reference. Parameter passing is described in detail in. In addition to these registers, RAX, R10, R11, XMM4, and XMM5 are volatile. All other registers are non-volatile. Register usage is documented in detail in and .
The caller is responsible for allocating space for parameters to the callee, and must always allocate sufficient space for the 4 register parameters, even if the callee doesn’t have that many parameters. This aids in the simplicity of supporting C unprototyped functions, and vararg C/C++ functions. For vararg or unprototyped functions, any float values must be duplicated in the corresponding general-purpose register. Any parameters above the first 4 must be stored on the stack, above the backing store for the first 4, prior to the call. Vararg function details can be found in. Unprototyped function information is detailed in .
Most structures are aligned to their natural alignment. The primary exceptions are the stack pointer and malloc or alloca memory, which are aligned to 16 byte, in order to aid performance. Alignment above 16 bytes must be done manually, but since 16 bytes is a common alignment size for XMM operations, this should suffice for most code. For more information about structure layout and alignment see. For information about the stack layout, see .
All non-leaf functions [functions that neither call a function, nor allocate any stack space themselves] must be annotated with data [referred to as xdata or ehdata, which is pointed to from pdata] that describes to the operating system how to properly unwind them, to recover non-volatile registers. Prologs and epilogs are highly restricted, so that they can be properly described in xdata. The stack pointer must be aligned to 16 bytes, except for leaf functions, in any region of code that isn’t part of an epilog or prolog. For details about the proper structure of function prolog and epilogs, see. For more information about exception handling and the exception handling/unwinding pdata and xdata see .