Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

(github.com)

90 points | by MediaSquirrel 2 hours ago

5 comments

craze3 2 hours ago
Nice! I've been wanting to try local audio fine-tuning. Hopefully it works with music vocals too
LuxBennu 2 hours ago
I run whisper large-v3 on an m2 max 96gb and even with just inference the memory gets tight on longer audio, can only imagine what fine-tuning looks like. Does the 64gb vs 96gb make a meaningful difference for gemma 4 fine-tuning or does it just push the oom wall back a bit? Been wanting to try local fine-tuning on apple silicon but the tooling gap has kept me on inference only so far.
[-]
- MediaSquirrel 1 hour ago
  Memory usage increases quadratically with sequence length. Therefore, using shorter sequences during fine-tuning can prevent memory explosions. On my 64GB RAM machine, I'm limited to input sequences of about 2,000 tokens, considering my average output for the fine-tuning task is around 1,000 tokens (~3k tokens total).
yousifa 1 hour ago
This is super cool, will definitely try it out! Nice work
dsabanin 2 hours ago
Thanks for doing this. Looks interesting, I'm going to check it out soon.
[-]
- MediaSquirrel 1 hour ago
  you are welcome! It was a fun side quest
pivoshenko 1 hour ago
nice!