ST4RT+ now in a handy RNN form factor!

I started with a variational autoencoder model but ended up deciding to do the hard work first. So, I decided to create a test of “me” as a small model using only six midi files of various works that I’ve created over the last two years (Figure 1).

**Figure 1:** Screenshot of the six midi files I used to create the test version of ST4RT+

Using Google Colab I needed to first do my setup and install fluidsynth (a midi synth) and pretty_midi (used to handle midi). From there I then to be able to upload my dataset (Figure 2).

**Figure 2:** Code to upload the dataset (after removing any old versions if they exist).

I then grab the first file, in this case “Theft By Rhythm (Split)2.mid” as the main file (I will in a later version separate the model from the input, in this instance it was easier and quicker to put the two tasks together.

From there I extract the notes from my input (Figure 3).

**Figure 3:** Extracting the notes from my sample file.

I then can visualize the piano roll so I can double check that I’m using the correct midi data (Figure 4).

**Figure 4:** Piano Roll of the sample midi file.

After this the geek in me decided to look at the distribution of each note variable (Figure 5). This allows me to see the number of times that a pitch occurs in the midi, the step timing (which is the time elapsed from the previous note or start of the track) and the duration of the notes.

**Figure 5:** Distribution of Pitch, Step, and Duration of the sample midi file.

Next, I create a dataset using all the midi files uploaded previously. With a small dataset of only six files this is super quick. From here we create a dataset in TensorFlow from all the notes that were in the upload files. This is the place where I will need to adjust for my model to get the best results. Using either magic numbers (trial and error for what works best) or use a technique called hyperparameter tuning (https://www.tensorflow.org/tutorials/keras/keras_tuner). Simply put it is a library that helps to pick the optimal set of parameters for the ML model to get the best results rather than using trial and error.

We then train the model (Figure 6). What we are looking for is that the plot goes down (it doesn’t need hit zero, but it does need to flatten out).

**Figure 6:** The model as it trains, it gets closer to zero and flattens out, this shows that the model training is working.

Finally, we start to generate notes from the model we have just built (Figure 7). We can use the “temperature” parameter from which higher numbers create more chaotic note choices or keep it lower to preserve more of what the training data looks like.

**Figure 7:** Generating notes from the model.

I then look at the midi file output (Figure 8) and could regenerate it or change the temperature and regenerate. For this generation it looks like the original input file (Figure 4) but with more from the other random midi data that I added to the training dataset.

**Figure 8:** Generated midi file from the input against the model.

This was a lot of work to complete, but I can confirm that it works as expected even considering the small dataset. I’ll work on getting a larger set of data together and making a few other modifications to make it easier to use.

I like how it modified the input, though it isn’t as easy a process to reiterate with the model yet. These future improvements will make this an interesting tool/collaborator.

References

FluidSynth (2024) FluidSynth (Version 2.4.3) [Computer program]. Available at: https://www.fluidsynth.org/(Accessed: 2 October 2024)

Pretty_midi (2023) pretty_midi (Version 0.2.10) [Computer program]. Available at: https://github.com/craffel/pretty-midi (Accessed: 2 October 2024)

Recent Posts

Categories