How many registers does an x86-64 CPU have? (2020)

(blog.yossarian.net)

27 points | by tosh 2 hours ago

4 comments

  • JonChesterfield 1 hour ago
    Good post! Stuff I didn't know x64 has. Sadly doesn't answer the "how many registers are behind rax" question I was hoping for, I'd love to know how many outstanding writes one can have to the various architectural registers before the renaming machinery runs out and things stall. Not really for immediate application to life, just a missing part of my mental cost model for x64.
  • nefsim 1 hour ago
    Even though this post is from 2020, it’s still a classic reference. It’s especially relevant now to revisit this baseline considering Intel’s APX which aims to double the GPRs to 32. Understanding how we got here is key to appreciating where the architecture is headed next.
  • fuhsnn 1 hour ago
    Intel's next gen will add 16 more general purpose registers. Can't wait for the benchmarks.
    • Joker_vD 1 hour ago
      So every function call will need to spill even more call-clobbered registers to the stack!

      Like, I get that leaf functions with truly huge computational cores are a thing that would benefit from more ISA-visible registers, but... don't we have GPUs for that now? And TPUs? NPUs? Whatever those things are called?

      • jandrewrogers 54 minutes ago
        Most function calls are aggressively inlined by the compiler such that they are no longer "function calls". More registers will make that even more effective.
      • throwaway17_17 1 hour ago
        Why does having more more registers lead to spilling? I would assume (probably) incorrectly, that more registers means less spill. Are you talking about calls inside other calls which cause the outer scope arguments to be preemptively spilled so the inner scope data can be pre placed in registers?
        • Joker_vD 8 minutes ago
          So, let's take a function with 40 alive temporaries at a point where it needs to call a helper function of, say, two arguments.

          On a 16 register machine with 9 call-clobbered registers and 7 call-invariant ones (one of which is the stack pointer) we put 6 temporaries into call-invariant registers (so there are 6 spills in the prologue of this big function), another 9 into the call-clobbered registers; 2 of those 9 are the helper function's arguments, but 7 other temporaries have to be spilled to survive the call. And the rest 25 temporaries live on the stack in the first place.

          If we instead take a machine with 31 registers, 19 being call-clobbered and 12 call-invariant ones (one of which is a stack pointer), we can put 11 temporaries into call-invariant registers (so there are 11 spills in the prologue of this big function), and another 19 into the call-clobbered registers; 2 of those 19 are the helper function's arguments, so 17 other temporaries have to be spilled to survive the call. And the rest of 10 temporaries live on the stack in the first place.

          So, there seems to be more spilling/reloading whether you count pre-emptive spills or the on-demand-at-the-call-site spills, at least to me.

        • CamelCaseCondo 44 minutes ago
          op is probably referring to the push all/pop all approach.
          • Joker_vD 6 minutes ago
            No, I don't. I use a common "spill definitely reused call-invariant registers at the prologue, spill call-clobbered registers that need to survive a call at precisely the call site" approach, see the sibling comment for the arithmetic.
    • BobbyTables2 23 minutes ago
      How are they adding GPRs? Won’t that utterly break how instructions are encoded?

      That would be a major headache — even if current instruction encodings were somehow preserved.

      It’s not just about compilers and assemblers. Every single system implementing virtualization has a software emulation of the instruction set - easily 10k lines of very dense code/tables.

      • Joker_vD 5 minutes ago
        The same way AMD added 8 new GPRs, I imagine: by introducing a new instruction prefix.
  • sylware 2 hours ago
    Don't forget x86_64 like ARM is IP-locked, RISC-V is not.