Show HN: RISC-V core written in 600 lines of C89

(github.com)

190 points | by mnurzia 320 days ago

14 comments

  • aportnoy 320 days ago
    How about a RISC-V disassembler in 200 lines of C99?

    https://github.com/andportnoy/riscv-disassembler/blob/master...

    • mnurzia 320 days ago
      This is really cool, thanks for sharing! Something like this would be a great tool to distribute with my emulator.
      • garganzol 320 days ago
        It would be nice if you could put a link to that project to your README file. Both projects are very impressive, especially when seen in conjunction with each other.
        • aportnoy 320 days ago
          I mean, his simulator already has a disassembler contained within it, would just need to replace comments with print statements.
  • bjourne 320 days ago
    Why stick with c89? Can't think of any compilers that doesn't support c99 nowadays. The major benefit is that you can use uint8_t and friends directly and don't need to define your own wrapper types.
    • mnurzia 320 days ago
      It's more of a fun exercise, I guess. But I do have experience with at least one compiler that doesn't support C99: Zilog's ez80 C compiler. Back in the day I used to program my TI-84+ CE for fun[0], and the only C solution was a pretty bespoke C89-only compiler[1] distributed with a community toolchain[2], which has since switched to clang. It's somewhat irrational, but in the back of my mind it bugs me if the software I write can't run on platforms like that.

      [0] https://github.com/mnurzia/chip8-ce

      [1] http://www.zilog.com/docs/appnotes/pb0098.pdf

      [2] https://ce-programming.github.io/toolchain/

    • flohofwoe 320 days ago
      One "advantage" (if one wants to call that) is that the code would also compile as C++, while C99 has diverged enough from the common C/C++ subset that one cannot use all C99 features in C++ mode.
      • mnurzia 320 days ago
        I totally missed this, good point.

        Slightly unrelated, but just thought I should mention: the sokol libraries are awesome!

      • bjourne 320 days ago
        There never was a "common C/C++ subset". See https://softwareengineering.stackexchange.com/a/298667/18260
        • flohofwoe 319 days ago
          Ah that old thing again ;)

          C isn't a subset of C++ (and never was). But there's still a common subset of both the C and C++ languages which compiles both in a C compiler and a C++ compiler (and behaves the same at runtime despite slightly different C vs C++ semantics), and that common subset is what I call C/C++: the pidgin dialect that's neither quite C nor quite C++ but compiles as both.

    • dezgeg 320 days ago
      I've met several people that seriously think that C89 is the peak of programming languages and that C99 just brings misfeatures (like, allowing variable declarations in middle of basic blocks according to them)
      • foobarbaz33 320 days ago
        Forcing declares at the top makes it easy to estimate at a glance (or exactly calculate) how much space the stack frame will use.
        • dezgeg 319 days ago
          Maybe in the early days of C, but with modern compilers doing stuff like keeping variables in only registers, inlining functions, stack cookies, merging non overlapping variables etc. that seems not really worth it. If you want to avoid accidental huge stack usage you can pass flag to gcc/clang to trigger warning when stack usage of a function goes over the specified limit.
    • boricj 320 days ago
      Funnily enough, the file rv.h does use stdint.h if available and contains the following comment:

      > All I want for Christmas is C89 with stdint.h

    • contrarian1234 320 days ago
      Did Visual Studio finally make the jump?! (you could always just compile it as C++ code though)
      • bjourne 320 days ago
        Nope stdint.h has been in msvc for over 10 years. Other c99 features may be not supported though.
        • flohofwoe 320 days ago
          Except for VLAs (which are optional post-C99 anyway), MSVC actually has pretty good support for recent C versions, and since 2020 they're basically back on the "modern C" train: https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-...
        • mort96 320 days ago
          Hasn't the main issue with MS been VLAs? I seem to recall that VLAs are the main reason MSVC won't ever support C99, and that MSVC is one of the main reasons why VLAs were made optional. It seems like MSVC supports C11 and C17 now, thanks to the removal of mandatory VLAs.
          • pjmlp 319 days ago
            The whole security industry has vettoed VLAs.

            Google even went the extra pay to pay the effort to remove them from the Linux kernel.

            • mort96 318 days ago
              Yeah, that's my point. The situation isn't, "MS is garbage, they only support C89"; the situation is "MS supports modern C pretty well, their lack of official C99 supports is just a technicality caused by VLAs which you shouldn't use anyway".
          • zabzonk 320 days ago
            vehement oppostion from ms my be one of the reasons for them being optional (and thus worthless) but the main one is that that they are impossible to use correctly. what happens if you make one too big?
            • mort96 320 days ago
              I think they could potentially have some very limited valid use cases, but I agree that a fixed length array and/or heap allocation is usually much better than VLAs.

              I was mainly just pointing out that MS's lack of C99 support isn't really a part of keeping C89 alive, especially now that they officially support C11.

            • Dylan16807 320 days ago
              > what happens if you make one too big?

              Doesn't that apply to fixed length arrays too?

              • mort96 318 days ago
                It does, but if you create a fixed length array that's too big, you'll just deterministically blow the stack regardless of user input. With VLAs (or alloca), your array length is determined by some runtime property. Whether you blow the stack doesn't just depend on the code path any more, but on the data you're operating on too.

                As a bonus, that data is often user input...

        • jpfr 320 days ago
          MSVC did a big rewrite of the C frontend around MSVC2013. I haven’t encountered C99 idioms that don’t work nowadays. Granted, I might not use every feature in my typical coding style…
          • arp242 320 days ago
            It's been "fully" C99 (and C11, C17) compliant for about 2 or 3 years. The only missing C99 featured before that were relatively rarely used ones like _Pragma.
            • flohofwoe 319 days ago
              It's not C99 compliant because that would require VLA support (which has been made optional in C11, which in turn enabled MSVC to be a C11 and C17 compiler, but not C99. Not that it matters much in practice though :)
  • garganzol 320 days ago
    Seeing the RISC-V instructions implemented in the emulator like that, it comes to my mind that RISC-V is really a reduced instruction set CPU.

    When compared to AVR 16-bit RISC instruction set, RISC-V looks so much simpler. (You may be indirectly familiar with AVR architecture by the household name "Arduino".)

    The intriguing part is that AVR is just a microcontroller, while RISC-V is intended to be a full-blown CPU.

    • opencl 320 days ago
      The base instruction set is tiny but there are quite a few extensions and pretty much every practical implementation includes at least a few of them.

      i.e. the GD32V microcontrollers implement RV32IMAC, Allwinner D1 which is a "full-blown" CPU meant to run Linux implements RV64IMAFDCVU.

      RV32I/RV64I are the base 32/64 bit integer instruction sets and every letter after that is a different extension. Most of the extensions are relatively small and simple, but the C (compressed instructions) extension introduces some decoder complexity and the V (vector) extension adds several hundred instructions.

      Though even with all the extensions it is still a very small/simple ISA by modern standards.

    • staunton 320 days ago
      Pretty much all architectures have "simple" instruction sets under the hood, that is, the microcode that executes the mess we put in binaries. RISC-V is based on the idea that you can skip most of this step. The difficulty is getting fusing and other optimizations to work so the throughout remains high, which seems to work so far.
  • bitwize 320 days ago
    I feel myself descending into old-fartitude more and more with every year. My wife and I were recently involved in a car accident (no one was hurt). While I was being checked out I overheard a 20-year-old firefighter exchange Facebook information with an 18-year-old EMT. I was like, "wait a minute, you guys seem really young and you still use Facebook? I thought Facebook was for your grandparents and all the kids now use Snapchat or TikTok?"

    I get that same feeling now. This kid is 20 and still using C89? Shouldn't people his age have been reared entirely in the crystal-spires-and-togas utopia of Rust, with raw pointers and buffer overruns being mere legends of their people's benighted past told to them by their elders?

    It's kind of comforting to see young programmers embracing the old ways, even if it's for hack value only.

    • mnurzia 320 days ago
      Admittedly, C89 has very little utility, especially among people my age. For example, my university progresses from Racket to Java to C++, and has a systems course that partially teaches C11. Although good for teaching, I don't think those languages artificially constrained me in the ways that C89 does. I felt that my programming skills improved the most when I forced myself to work in such an under-powered language.

      I also like the idea of being able to run my code anywhere, kind of like Doom.

    • sitkack 320 days ago
      I think kids or at least there’s the risk of kids seeing old people romantically reenacting their eight bit micro days and think that it’s some thing besides nostalgia.

      I was kind of the opposite as a kid, if it wasn’t crazy futuristic I didn’t want it. So even in the 80s I wanted an FPGA accelerators in every machine.

      • bitwize 319 days ago
        It's not just nostalgia. Those old computers really are fun to operate -- like an MGB is fun to drive -- in ways modern systems aren't, even if they are far less useful than a modern system. In fact it's now possible to take advantage of modern software tools on modern systems and push those old beasts to new heights they couldn't have possibly reached during their heyday.
      • LoganDark 319 days ago
        > even in the 80s I wanted an FPGA accelerators in every machine

        Mostly unrelated, but I recently discovered that you can buy TPUs, right now, as a consumer product, from https://coral.ai.

        The stock firmware already allows you to run these things so hard they overheat, which is amazing.

        But yes, I also want FPGA accelerators.

  • nevi-me 320 days ago
    Question: do the implementation of single instructions compile to single instructions if targeting RISC-V with optimisations enabled? That would be really awesome if compilers realise what your code is doing and replace the implementations of instructions with those instructions.
    • mnurzia 320 days ago
      Not really, my implementation isn't smart enough to guide compilers to the right solution. Trivial instructions, like xor, are of course recognized, but for example the 32x32 mul implementation isn't. Maybe compilers will be smart enough one day...

      https://godbolt.org/z/WEcTzKf7M

    • dbcurtis 320 days ago
      Yeah, well, the rock that breaks your pick in that scenario is copying all the processor state back and forth to/from the emulation model, including flag register bits, and also correctly handling exceptions and faults. Emulating the instruction’s happy path is just scratching the surface.
      • sitkack 320 days ago
        In the guest, you trap on reading emulation state, so that the source of truth is the hardware. Rather than use something like KVM I wonder if you could run another child process and use P trace?
      • duskwuff 320 days ago
        > including flag register bits

        RISC-V doesn't have those. Compare+branch is a single instruction.

  • peterfirefly 320 days ago
    'switch' is a really, really nice language construct that was fully implemented long before C89. Using lots of nested 'if's instead is not a good idea.
    • hgs3 320 days ago
      'switch' is good, but for VM's computed goto is better.
      • KerrAvon 320 days ago
        depends on the compiler implementation. modern compilers may be able to treat equivalent switch statements, gotos, and if/else statements pretty much the same
        • nsajko 320 days ago
          Only in trivial cases.
    • sylware 320 days ago
      nested "ifs" are optimized out by compilers. Moreover in the latest horrible gcc extensions you have the case statement using a _not_compiler constant expression (you can find the usage of such horrible gcc extension in linux net code).
      • mnurzia 320 days ago
        This was my one of my main justifications for making this design choice, in addition to the (in my opinion) overwhelming amount of break statements that would result from using switches. But more importantly, many of the "if" statements have non-constant or more complex expressions in them that aren't supported in switch statements in ANSI C.
        • sylware 320 days ago
          Yep.

          And as you stated, it is important to stay as much as possible close to c89, because ISO is literaly doing planned obsolescence, but on a long time cycle (5-10 years).

          Hopefully risc-v will be a success, and all system components and interpreters of very-high-level languages will be rewritten in risc-v assembly and it will become actually very hard to do planned-obsolescence.

  • freecodyx 320 days ago
    This proves that at the core. The things we rely on to achieve great software and life impacting technologies are extremely simple. The complexity is that how to make them.
    • arcticbull 320 days ago
      The core concepts are generally very straightforward, however it's always the optimization that adds complexity. That's how you get the orders of magnitude improvement. This C89 core definitely doesn't do macro op fusion for instance.
      • freecodyx 319 days ago
        By complexity, i meant the hardware it self. Not even the architecture or the instructions set. For example you can design a virtual machine in days. But making a real one is at the core of the geopolitical issues we have today
    • numpad0 320 days ago
      The complexity is in how to distribute dev workload and how to make it financially viable. No one pays for beautiful works of art unless it’s somehow anchored, tangled and aligned into their interests.
  • charcircuit 320 days ago
    This isn't a RISC-V core. It is a RISC-V emulator library.
  • sylware 320 days ago
    A bigger implementation, but has 64bits support:

    https://bellard.org/tinyemu/

  • RobotToaster 320 days ago
    Is this designed to be used with some kind of C to VHDL/verilog transpiler?
    • RealityVoid 320 days ago
      Not really, think of it like a... CPU emulator? Ish? You have registers as variables in the program. If you have register a1 and you are at an instruction adding 1 to it, it will add 1 to the variable representing a1. So on and so forth.

      This works because, well, memory operations are mostly(all?) a CPU does so this "core" takes the program and does the same kind of memory operations the silicon would do, only in SW.

  • userbinator 320 days ago
    Besides not using a switch() for the main instruction decode, there's nothing surprising here. Anyone who has worked with emulators before will find this code straightforward to read. RISC-V really is the new MIPS.
  • rowanG077 320 days ago
    The Readme doesn't answer it but I struggle to see why you want a c implementation of an ISA.
    • detrites 320 days ago
      Not sure if this was intended, but coming to this as someone vaguely aware of RISC-V, it's looking like a fantastic form of documentation for the ISA, that both describes and gives a way to play with it, but in an intuitive, even fun manner.

      Obviously this works best for someone who already knows C - but, given it's C89 mitigates against this aspect somewhat.

      • rowanG077 320 days ago
        A reference implementation would be in Verilog or VHDL.
    • nly 320 days ago
      So you can compile and run it on any platform with a C compiler
      • rowanG077 320 days ago
        That is just something you can do with C code. That is not a goal in itself. Why would you want to run a C ISA instead of just using a standard simulator? Why not use verilator + any of the open source RISC-V cores?
        • LoganDark 320 days ago
          Because those are slower, more complex, and more difficult to understand?
          • rowanG077 320 days ago
            I doubt verilator is much slower. The speed of it is insane. They are indeed more complex and difficult to understand. But I fail to see how that is a criterium. I would very much rather include an industry standard library in comparison to something homegrown.
            • LoganDark 319 days ago
              > They are indeed more complex and difficult to understand. But I fail to see how that is a criterium.

              It's... not? Like, if you want to merely use an ISA, you don't need it to be simple or easy to understand, in fact tons of people pride themselves on making extremely high-performance RISC-V cores with OoOE and so on.

              But the reason why someone might want a C implementation of an ISA is different from the reason people might want to go implement an ISA in a real project: maybe they want a software simulator that is easy to understand for one reason or another, perhaps for learning or demonstration purposes, or just as a fun hobby project.

              These people wouldn't benefit from just pulling down Verilator or using one of the existing BATTLE-TESTED INDUSTRY-STANDARD PROFESSIONALLY-AUDITED HIGH-PERFORMANCE implementations because they literally don't care about any of those things.

              In any case, it's a fallacy to assume that every programming project out there has to address a need in order to have a place. https://justforfunnoreally.dev

              • rowanG077 319 days ago
                Yes a hobby/learning is a great reason, why not just start with that? I was simply wondering what the reason was. Maybe the author had an interesting need for a C implementation to do something. Me asking for the reason is not me saying it doesn't deserve to exist. I don't think you are arguing in good faith here.
                • LoganDark 319 days ago
                  > why not just start with that?

                  I'm not the one who started the thread.

                  > I don't think you are arguing in good faith here.

                  With that, this discussion is over.

    • Farmadupe 320 days ago
      Considering it's allocation-free, maybe it's an ultralight/ simulator for checking large quantities of compiler output? (i.e no VM to create and destroy for every testcase)

      Or the same but for testing some verilog/vhdl CPU implemetation in a simulator?

      Or since it's only 500SLOC, maybe it's just for fun!

      • mnurzia 320 days ago
        This is an excellent idea. One limitation of a testing library of mine, `mptest`, is its inability to sandbox tests. I may take this idea and develop a more robust (and potentially parallel) testing framework around it.
      • rowanG077 320 days ago
        Then I would expect a comparison with verilator.
    • srgpqt 320 days ago
      Perhaps this could be used to run sandboxed code. Game engines could safely run mods using something like this, ala QuakeC.
      • mnurzia 320 days ago
        Definitely. My motivation for writing this was to have a simple CPU for a virtual game console-like project. I decided to release it on its own, though.
      • mcraiha 320 days ago
        For modern game engine you most likely want WebAssembly support. e.g. Flight Simulator does that https://flightsimulator.zendesk.com/hc/en-us/articles/766290...
        • srgpqt 320 days ago
          Sure, I’d love to see your 600 line webassembly interpreter.
          • sitkack 320 days ago
            Run wasm on this core.
  • sutterbutter 320 days ago
    As a total newb to programming, what am I looking at here?
    • detrites 320 days ago
      There are several different types of CPU's, in two main classes, CISC and RISC. The difference is summarised by the first letter - "Complex" vs. "Reduced" - Instruction Set Computer. Or, what size "vocabulary" a CPU decodes.

      RISC-V is a type of CPU architecture (a set of plans for how to build one, not an actual CPU itself), that also happens to be open source. Anyone can build a RISC-V CPU without having to buy the rights to do so. (Many are.)

      This project is an emulation of a RISC-V CPU. A kind of virtual "reference" CPU in software. It can be used to compile code that can run on a RISC-V type CPU, and to help understand what's happening inside the CPU when it runs.

      It's written in C, which is and was a very fundamental programming language that's influenced the design of many other languages. It is a language that is very close the fundamental language CPU's natively decode and process.

      CPU's natively use a language referred to as "Assembly", but which actually has many varieties particular to each CPU design. Regardless of variety of CPU, assembly is usually is about as reasonably "close to metal" as it gets.

      It's literally communicating with the CPU directly in its own language. This makes it extremely fast to run, but laborious to code, and also somewhat "dangerous" in that with such low-level control, it's easy to mess things up.

      This project takes an input of a text list of RISC-V assembly instructions (a "program") and pretends to be RISC-V CPU with those instructions loaded into it and being run on it. Useful for understanding, prototyping and building a RISC-V program.

      CPU's are designed rather to run assembly that already "works", having been created programmatically (compiled or interpreted), by a higher level language that isn't going to give it things that make no sense (hopefully).

      So there is not usually a lot of provisioning done in the design of the CPU to make it easy to watch it and its state carefully at a low level and examine how your assembly program is working, or not working. Emulation eases this.

      • dragonwriter 320 days ago
        > CPU’s natively use a language referred to as “Assembly”, b

        Strictly, CPUs use machine code. Assembly targeting a particular CPU is a very thin more-human-readable abstraction around the underlying machine code, but it is not, itself, what the CPU executes. That’s why “assemblers” exist – they are compilers from assembly language to machine code (though, because assembly is a very thin abstraction, they are much simpler than most other compilers.)

        • detrites 320 days ago
          Agree. And deeper than that may be microcode, which we rarely see or reason about, and while may very much be there is rarely of practical use. (Ie, when learning, the distinctions may be somewhat an impediment without payoff.)
        • tester756 320 days ago
          Would calling "Assembly" a CPU's frontend language be correct?

          The same way as it is in compilers

      • touggourt 316 days ago
        This is a well writed explanation!

        I wrote a short news about the emulator on the french collaborative website linuxfr.org (see https://linuxfr.org/news/un-emulateur-et-un-desassembleur-ri...)

        I would like to translate your comment and add it. Can I ?

      • sutterbutter 319 days ago
        Wow this is beyond helpful. Thank you so much for taking the time to explain this so thoroughly.