A better Ghidra MCP server – GhidrAssistMCP

(github.com)

99 points | by jtang613 1 day ago

9 comments

  • its-kostya 1 day ago
    It's been a few years since I've rolled up my sleeves and did some reverse engineering with Ghirda. The skill is very "use it or lose it" so I wonder if this will help me get back into it quicker. Or... a ton of hallucinations leading down dead end rabbit holes.

    Curious if anyone has given it a shot an can speak to the experience.

    • axoltl 1 day ago
      I can't comment on MCP use specifically but I can comment on using an LLM while reversing. I use a local instance of whatever ends up being SOTA for local reasoning LLMs at 30B-70B params quantized to 4-6b. I feed it decompiled code to identify functions that are 'tedious' to reverse engineer. I recently reversed a binary that was compiled with soft float and had no symbols or strings. A lot of those functions end up being a ton of bit-twiddling. While I reversed the business logic I had the reasoning model identify the soft float functions with very minimal prompting. It did quite well on those!

      I also tried to have it automatically build some structs from code showing the access patterns, and it failed miserably on that task. Likely a larger model (o3 or opus) would do better here.

      I personally don't think letting an LLM do large parts of the reversing would be useful to me as I build up a lot of my mental model of the system during the process, so I'd be missing out on that. But for handling annoying bits of code I'd likely just forego otherwise? Go ham!

      • segmondy 1 day ago
        You hit the target on what most miss about LLMs, part of work is building up a lot of mental model of the system you are working on. When LLM does the work, it becomes easy to miss that mental model.
        • jhart99 23 hours ago
          I tried to use an LLM for assistance with reversing some embedded code and agree with this. I had built up a pretty decent model of what was going on before starting. It was able to explain what was going on in this one perplexing function quite well but when I'd feed it decent sized blocks of code it would hallucinate like crazy. But I was quite happy with the performance at finding the basic library and ROM functions and annotating them correctly. I think it is all in how you use it.
    • jtang613 1 day ago
      Thanks for the interest. I wrote GhidrAssistMCP and the original GhidrAssist plugin which work hand-in-hand because I find they improve my RE workflow. They're not immune from hallucinations because the underlying models are not. However, they are fairly rare and I have had very reliable results with both Claude and ChatGPT. When used together, GhidrAssist+GhidrAssistMCP have been able to do some impressive analysis tasks.

      If you're just getting back in the saddle, you might want to give both a try. In particular, GhidrAssist's "Explain Function" tool is really helpful at quickly summarizing code and reducing the mental overhead of making sense of large binaries.

    • justmarc 23 hours ago
      Applies to everything. If you never had it in muscle memory, you lose it.
  • PradeetPatel 23 hours ago
    Thanks so much for sharing!

    I'm interested to see how MCP and the development in AI will impact the CTF scene in the future.

  • flowerthoughts 15 hours ago
    Thanks for sharing!

    I was about to start doing this, then realized I shouldn't nerd-snipe myself... The original extension definitely felt user unfriendly, so I was using Claude Code manually, feeding it an exported listing file. The listing files lack full addresses, so it wasn't optimal source material.

  • 0xbadc0de5 7 hours ago
    Works great! GhidrAssist + MCP are awesome.
  • leoqa 1 day ago
    Why is this better than the other one?
    • jtang613 1 day ago
      GhidrAssistMCP features:

      - several additional tools (like get_class_info, search_classes, etc),

      - it has GUI config and logging,

      - and it does not rely on an external Python bridge to host the MCP Server - it's monolithic (using the official MCP Java SDK).

  • gg82 17 hours ago
    I wonder if embeddings could be created from open source and library code and then used to convert back the code with all the correct variable and function names.
    • Everdred2dx 17 hours ago
      It's not AI but Ghidra has a cool feature called BSim which does something similar. Each function get's a "feature vector" which now that I think about it has some clear parallels to embeddings.
      • mixel 13 hours ago
        Wow that is cool, I bet with that feature and a huge database of known "feature vectors" from open-source libraries so you can focus on the actual business logic of the binary instead of trying to reverse external library functions
    • nekitamo 13 hours ago
      I've been wondering the same thing. However you would have to have a very large database of embeddings for this to be useful, right?

      Otoh I can see this being disproportionately helpful with reverse Engineering Rust and Go binaries, which usually include many opensource dependencies

  • electroglyph 22 hours ago
    nice, now do x64dbg!