Retrofitting JIT Compilers into C Interpreters

(tratt.net)

65 points | by ltratt 17 hours ago

7 comments

  • 9fwfj9r 19 minutes ago
    Those interested in this type of work can also visit https://cfallin.org/blog/2024/08/28/weval/. The difference is that they use this technique to derive an AOT compiler.
  • fuhsnn 5 hours ago
    Took me a while to figure out whether it's interpreters for C programs or if there's a particular class of interpreters called "C". Turns out it's about interpreters implemented in C that they use modified LLVM to do the retrofitting, but couldn't it be applicable for other languages with LLVM IR, or other switch-in-a-loop patterns in C?
    • itriednfaild 5 hours ago
      I've been a low level C and C++ programmer for 30 years. Even with your explanation and having read the webpage twice I have no idea what this technology does or how it works. So it takes normal interpreted code and jits it somehow? But you have to modify the source code of your program in some way?
      • hencq 4 hours ago
        I think the website does an amazing job explaining it, but it basically takes an interpreter written in C and turns it into a JIT with minimal changes to the code of the interpreter (i.e. not to the code of the program you're running in the interpreter). For example they took the Lua interpreter and with minimal changes were able to turn it into a JIT, which runs Lua programs about 2x faster.
      • vkazanov 1 hour ago
        tracing jits are slightly harder to grasp than usual ones. The technique comes from real CPUs so the mindset of people behind the original idea is very different from the software world.

        Metatracing ones are kind of an interesting twist on the original idea.

        > So it takes normal interpreted code and jits it somehow?

        Anyway, they use a patched LLVM to JIT-compile not just interpreted code but the main loop of the bytecode interpreter. Like, the C implementation itself.

        > But you have to modify the source code of your program in some way?

        Generally speaking, this is not normally the goal. All JIT-s try to support as much of the target language as possible. Some JIT-s do limit the subset of features supported.

      • fuhsnn 4 hours ago
        I don't fully grasp it either, the most appropriate analogy I can think of is like how OpenMP turns #pragma annotated loops into multi-threading, this work turns bytecode interpreting loops into JIT VM.
  • djwatson24 5 hours ago
    It's quite impressive they're able to take nearly arbitrary C and do this! Very similar to what pypy is doing here, but for C, and not a python subset.

    However not without downsides. It sounds like average code is only 2x faster than Lua, vs. LuaJit which is often 5-10x faster.

    • hypercube33 4 hours ago
      Hmm I'm wondering how hard it would be to redo the old timey Microsoft jvm from the 90s for modern days....java > .net assembly runtime
  • linzhangrun 2 hours ago
    It's truly a good thing to see a project like this in the era of Vibe Coding taking flight :)
  • measurablefunc 1 hour ago
    Why do they need to change LLVM? Why can't they make this another LLVM IR pass?
  • sgbeal 16 hours ago
    i tend to think of myself as a computing nerd, but posts like this one make me realize that i don't even rate on the computing nerd scale.
    • throwaway1492 4 hours ago
      Do you always make things about yourself? Have you written a parser or interpreter? You should, it’s an interesting exercise. The idea is to add meta tracing to the interpreter (the c code) that allows hot paths to be compiled to machine code and be then executed instead of being interpreted.
  • mwkaufma 5 hours ago
    TL;DR compile with a fork of LLVM that enables runtime IR tracing. Very clever!
    • measurablefunc 1 hour ago
      That's not what they're doing. They're directly modifying the IR to convert it into a tracing JIT. The final artifact is a binary w/ no IR. The problem is of course not introducing any subtle bugs in the process b/c they'd have to prove the modification they're making do not change actual runtime semantics for the final binary artifact.