5 comments

  • craze3 2 hours ago
    Nice! I've been wanting to try local audio fine-tuning. Hopefully it works with music vocals too
  • LuxBennu 2 hours ago
    I run whisper large-v3 on an m2 max 96gb and even with just inference the memory gets tight on longer audio, can only imagine what fine-tuning looks like. Does the 64gb vs 96gb make a meaningful difference for gemma 4 fine-tuning or does it just push the oom wall back a bit? Been wanting to try local fine-tuning on apple silicon but the tooling gap has kept me on inference only so far.
    • MediaSquirrel 1 hour ago
      Memory usage increases quadratically with sequence length. Therefore, using shorter sequences during fine-tuning can prevent memory explosions. On my 64GB RAM machine, I'm limited to input sequences of about 2,000 tokens, considering my average output for the fine-tuning task is around 1,000 tokens (~3k tokens total).
  • yousifa 1 hour ago
    This is super cool, will definitely try it out! Nice work
  • dsabanin 2 hours ago
    Thanks for doing this. Looks interesting, I'm going to check it out soon.
    • MediaSquirrel 1 hour ago
      you are welcome! It was a fun side quest
  • pivoshenko 1 hour ago
    nice!