The two calling conventions you'll actually encounter in 64-bit reverse engineering are System V AMD64 ABI (Linux, macOS, BSDs) and Microsoft x64 (Windows). Everything else — fastcall, stdcall, cdecl — is a 32-bit world artifact. This page is the one-screen reference you can keep open while reading IDA output.
System V AMD64 — the Linux/macOS one
Argument registers (in order)
- RDI, RSI, RDX, RCX, R8, R9 — first six integer/pointer args
- XMM0–XMM7 — first eight floating-point args
- Anything beyond goes on the stack, right-to-left, caller cleans up
Return value
- RAX — integer/pointer return
- RDX:RAX — 128-bit return (e.g. __int128)
- XMM0 / XMM0:XMM1 — float/double return
Volatile (caller-saved) registers
RAX, RCX, RDX, RDI, RSI, RSP, R8–R11, all XMM registers. The callee can clobber these freely, so the caller must save them if they're needed after the call.
Non-volatile (callee-saved) registers
RBX, RBP, R12–R15. If the callee uses these, it must push them on entry and pop them on exit.
Stack
- 16-byte aligned right before a CALL instruction
- After CALL, the return address (8 bytes) is on the stack — first instruction usually pushes RBP, restoring 16-byte alignment
- Red zone: 128 bytes below RSP that leaf functions can use without adjusting the stack
Varargs
AL must contain the number of XMM registers used (0–8) before calling a varargs function. The classic printf prologue checks this and pulls float args from the stack-saved register area accordingly.
Microsoft x64 — the Windows one
Argument registers (in order)
- RCX, RDX, R8, R9 — first four args (integer/pointer OR float, position-based)
- Floats go in XMM0–XMM3 corresponding to the same position
- Beyond four: stack, right-to-left, caller cleans up
Return value
RAX for integer/pointer. XMM0 for float/double. No 128-bit register-pair return.
Shadow space (the gotcha)
Even though the first four args go in registers, the caller MUST allocate 32 bytes of stack space for them — "shadow space". The callee can spill the four register args into this space if it needs to. Forgetting to allocate shadow space is the single most common source of mysterious crashes in handwritten Windows assembly.
Volatile (caller-saved)
RAX, RCX, RDX, R8–R11, XMM0–XMM5.
Non-volatile (callee-saved)
RBX, RBP, RDI, RSI, R12–R15, XMM6–XMM15. Note: XMM6+ are callee-saved here, unlike System V.
Stack alignment
16-byte before CALL, just like SysV. After CALL + 8 bytes of return address pushed, alignment is 8 — first prologue instruction (push RBP or sub rsp) typically restores 16.
Quick visual: a call in disasm
; System V — calling foo(0x10, 0x20, 0x30):
mov edi, 0x10
mov esi, 0x20
mov edx, 0x30
call foo
; Microsoft x64 — same call, same args:
sub rsp, 0x28 ; 32 shadow + 8 align
mov ecx, 0x10
mov edx, 0x20
mov r8d, 0x30
call foo
add rsp, 0x28Spotting which convention you're looking at: if you see args going into RDI/RSI, you're on Linux/macOS. If you see args in RCX/RDX with a sub rsp, 0x?? before the call, you're on Windows.
Common confusions
- Wine binaries on Linux still use Microsoft x64 internally — they translate at the syscall boundary, not at every function call.
- Some compilers add their own conventions for Vector-Calls (XMM-heavy code) — Intel ICC and MSVC's `__vectorcall` are the two you'll see.
- On macOS arm64, the convention is AArch64 PCS; this cheatsheet doesn't apply. Apple Silicon binaries are an entirely different ABI.
- Functions tagged `naked` or `__attribute__((naked))` skip the prologue/epilogue entirely — no register save, no stack adjustment. You're on your own to comply with the convention.
Here's an x86_64 disassembly snippet. Tell me which calling convention it follows, what each register holds at the CALL site, and what the return type looks like. [paste disasm]Open this in Aether →