Weights
Some types of AI models use weights help tune the output that they produce. Simply put, weights are numbers that help to define the internal structure of a machine model. They help to tell the model what we consider to be important when creating an output. The closer you are to 1.0 indicates higher importance for the model, and the closer you get to zero the less important. In ST4RT+ we have Pitch, Step (start of notes), and Duration (end of notes). These work together to help the AI decide what notes to predict based on the input and the training data.
So how do these work to create an output?
Well, for the most part it’s the weights that give a level of control over what the model sees as the “right” predictions.
With the weights for the ST4RT+ model all at “1.0” we have all variables in the model to be of equal weight. Nothing is more important than anything else. So, when we add an input to the model, Pitch, Step, and Duration, are all equally important. This results in what we can see in Figure 1.

With all weights set to “1.0” we can see that because Step, and Duration, are equally as important as Pitch, that there is less variation in the pitches predicted. Also of note, around halfway through the output the Duration of the notes (their length) suddenly truncates. This is due to the Long/Short Term Memory (LSTM) window dropping out. The LSTM is the AI models memory and allows it to remember melodic phrases. I used a shorter window due to the amount of memory usage on Google Colabs. As you increase the LSTM window by a note you increase the processing requirements exponentially (mHelpMe, 2020) the software needs to remember not only the steps previously but also the predictions.
With the weights set as follows: Pitch “0.005”, Step “1.0”, and Duration “1.0” we get what you can see in Figure 2.

This creates a more random fluctuation of pitches. The model is told that you can be less worried about the predicted value of the pitches and instead concentrate on the Step and Duration parameters. In this instance, it created a more widely ranging melody, going down as low as note 40 (E2 – E 2nd Octave), while when Pitch was set to “1.0” the lowest note was note 72 (C5 – C 5th Octave). This can give me some control over the harmonic width of the melody.
When we set the weights as follow: Pitch “1.0”, Step “0.005”, and Duration “1.0” we get what you can see in Figure 3.

With the same input as the previous examples, you can see that with the step value being low tells the model to make sure that Pitch and Duration are important. Step, where the start of a Pitch should be, now creates a large overlap between the notes. This is due to a large difference between the start of a note and its end (the Duration). The model correctly makes the end of the note according to the input data, but the start of each note is wildly early, which creates these overlaps.
Finally, when we set the weights as follows: Pitch “1.0”, Step “1.0”, and Duration “0.005” we get what you can see in Figure 4.

In this result we can see that the Pitch and Step are even. Not too much pitch variation. And, while there is some variation in the length of notes, they tend to get shorter as the LSTM window starts to fill. Because Duration and Step are co-dependent on one another, we can see how they both influence one another.
How does this help me?
It helps me when I’m looking at any output generated by ST4RT+ and then can make an educated determination on how I can get something more musical from the model by making small modifications to the weights placed on the model. This type of fine tuning isn’t as necessary when you have a larger dataset, but as I’m working with a micro sized dataset, I sometimes need to make small modifications to get the best from the model.
Bibliography
mHelpMe (2020) “LSTM network window size selection and effect”. StackExchange. Available at: https://stats.stackexchange.com/questions/465807/lstm-network-window-size-selection-and-effect (Accessed: 4 February 2025)