I started with a variational autoencoder model but ended up deciding to do the hard work first. So, I decided to create a test of “me” as a small model using only six midi files of various works that I’ve created over the last two years (Figure 1).

Using Google Colab I needed to first do my setup and install fluidsynth (a midi synth) and pretty_midi (used to handle midi). From there I then to be able to upload my dataset (Figure 2).

I then grab the first file, in this case “Theft By Rhythm (Split)2.mid” as the main file (I will in a later version separate the model from the input, in this instance it was easier and quicker to put the two tasks together.
From there I extract the notes from my input (Figure 3).

I then can visualize the piano roll so I can double check that I’m using the correct midi data (Figure 4).

After this the geek in me decided to look at the distribution of each note variable (Figure 5). This allows me to see the number of times that a pitch occurs in the midi, the step timing (which is the time elapsed from the previous note or start of the track) and the duration of the notes.

Next, I create a dataset using all the midi files uploaded previously. With a small dataset of only six files this is super quick. From here we create a dataset in TensorFlow from all the notes that were in the upload files. This is the place where I will need to adjust for my model to get the best results. Using either magic numbers (trial and error for what works best) or use a technique called hyperparameter tuning (https://www.tensorflow.org/tutorials/keras/keras_tuner). Simply put it is a library that helps to pick the optimal set of parameters for the ML model to get the best results rather than using trial and error.
We then train the model (Figure 6). What we are looking for is that the plot goes down (it doesn’t need hit zero, but it does need to flatten out).

Finally, we start to generate notes from the model we have just built (Figure 7). We can use the “temperature” parameter from which higher numbers create more chaotic note choices or keep it lower to preserve more of what the training data looks like.

I then look at the midi file output (Figure 8) and could regenerate it or change the temperature and regenerate. For this generation it looks like the original input file (Figure 4) but with more from the other random midi data that I added to the training dataset.

This was a lot of work to complete, but I can confirm that it works as expected even considering the small dataset. I’ll work on getting a larger set of data together and making a few other modifications to make it easier to use.
I like how it modified the input, though it isn’t as easy a process to reiterate with the model yet. These future improvements will make this an interesting tool/collaborator.
References
FluidSynth (2024) FluidSynth (Version 2.4.3) [Computer program]. Available at: https://www.fluidsynth.org/(Accessed: 2 October 2024)
Pretty_midi (2023) pretty_midi (Version 0.2.10) [Computer program]. Available at: https://github.com/craffel/pretty-midi (Accessed: 2 October 2024)