Ask HN: Is there an ML model that can go from an audio song to sheet music?

Trying to learn some foreign music where the sheet music is not exactly available (old folk music type stuff) so I was wondering if there is a model or an app that can go from an audio file to sheet music or notation.

25 points | by uptownfunk 865 days ago

7 comments

ikhatri 865 days ago
I was going to post basic pitch from Spotify but it looks like billconan beat me to it. That said I can give you a bit more advice. The Spotify basic pitch model isn't too good at multi-track input. It's capable of it, but you may actually get better results if you separate out the tracks first and then run them individually through the basic pitch model.
In order to do this you can use a source/stem separation model like spleeter (https://github.com/deezer/spleeter) and then run the basic pitch model (or any other midi transcription model). There's other you can try which may yield better results, for example: (https://github.com/Music-and-Culture-Technology-Lab/omnizart)
Either way the key words you want to be looking for are "midi transcription" and "stem separation", should help you find more models to try for both steps. Good luck! :)
EDIT: Oh it looks like there's even a stem separation leaderboard on papers with code, neat: https://paperswithcode.com/task/music-source-separation
[-]
- crtified 865 days ago
  That is insane(ly cool). Do you know how the separation models work, in principle?
  [-]
  - ikhatri 865 days ago
    Yup! It's a U-Net style convolutional neural network that runs on the spectrograms. There's a short 2 page abstract here: https://archives.ismir.net/ismir2019/latebreaking/000036.pdf and they cite this prior work as where they got the architecture from: https://arxiv.org/abs/1906.02618
tmaly 865 days ago
Ableton Live has something that can convert audio to midi.
Then there are open source programs that can convert midi to sheet music.
I have tried the process, but you tend to get artifacts in the midi from Ableton.
You have to know a little bit about music theory to clean it up.
[-]
- Lariscus 865 days ago
  Is that worth it tough? It seems to me the effort of cleaning up sheet music generated from MIDI would be higher than transcribing the whole piece by hand.
  [-]
  - tmaly 860 days ago
    It depends on how good you are at music theory.
    Some people can do this cleanup really quickly.
- uptownfunk 865 days ago
  Oh cool, do you know if this works for recordings where there are multiple instruments (clapping, organ, vocals, hand drums)
  [-]
  - jrflowers 865 days ago
    I imagine that if you separated a track into its components you could extract the sheet music for each part individually. I don’t know how sheet music for clapping would work though.
    [-]
    - solarmist 865 days ago
      I imagine just notes on the C line with an x plus stem like is used for basic drum rhythm notation.
  - tmaly 860 days ago
    It does, but it does not break them out in to tracks.
billconan 865 days ago
https://basicpitch.spotify.com/
[-]
- uptownfunk 865 days ago
  Thanks, I need something that can handle multiple instruments in the recording
  [-]
  - ksherlock 863 days ago
    spleeter, demucs (my favorite), etc, can split a song into 4 tracks (drum, bass, vocal, and other). Your other comments mentioned vocals, percussion (hand drums and claps), and organ; on paper it sounds like it should work.
- esafak 865 days ago
  wow, how did you find this? I wonder what else they have that they don't show.
  [-]
  - n3v3r3v3r 865 days ago
    <https://github.com/spotify> seems to be the place to look
CrypticShift 863 days ago
This is called (multitrack) music transcription. There are some commercial solutions (AudioScore, AnthemScore, ...). For OSS, look at Omnizart [1] and magenta/mt3 [2].
I suppose these models are trained on western / pop music, so they may not work nicely on ethnic music.
[1] https://github.com/Music-and-Culture-Technology-Lab/omnizart [2] https://github.com/magenta/mt3
radq 865 days ago
I've been working on training models for this problem. Haven't shipped as a product yet because the inference cost is too high (takes 10-15 minutes to transcribe each song). Would love to try it out on the music you're trying to learn and get your feedback. Email address is in my profile.
ndstephens 865 days ago
https://www.celemony.com/en/melodyne/what-is-melodyne
This might help. Not exactly what you're looking for.
moomoo11 864 days ago
If someone's seriously working on this, I would be happy to help out and build the mobile apps around it.
Imagine how cool, useful, and joyful this tech would be to most people.