Categories
Research

Like a circle in a spiral, like a wheel within a wheel…

For the Final Major Project (FMP), I have decided to combine my other passion, technology, with my music practice.

Introducing ST4RT+! An AI version of me, to help me co-create with an AI for composition.

To do this I have done a bit of a dive into Machine Learning (ML) models with a view to being able to do this project without having to train on an exceptionally large dataset. This is important for a few reasons: I do not want to spend most of my time creating training data (I want to be able to feasibly do the project in the time given), and I also want to make sure the the model data is ethical.

So, to find a model that has these attributes, I looked first at Recurrent Neural Networks (RNN) and Variational AutoEncoders (VAE). Both can utilise “latent space” in a pre-trained model. Latent space is where you create a smaller model within an existing model that trains on a much smaller set of data (Figure 1).

Figure 1: Using Latent Space in an VAE model to reduce the amount of training data needed to create a working ML model. (Source: https://magenta.tensorflow.org/midi-me)

For an RNN Model there are two different types, Lookback and Attention, that were both created to give the model the ability to create long term structure in the music it produces. The Lookback RNN can recognize patterns that occur over a 1-2 bar range, using a technique called LSTM (Long Short-Term Memory).

To learn a longer pattern phrase, we use an encoder/decoder during every output step for each new note. This is how the Attention RNN model works. Simply put, with every new note added to the sequence from the model, we look at the last number of steps to evaluate what the next note should be. 

My work from here is to play with some of the code and produce a proof of concept of both a RNN and a VAE-based model and see if the results are acceptable after training both types of models using my input in the latent space to create a model that sounds more like me.

From there I should have the tools to use the AI model as a way of bouncing off as a creative partner, and a form of co-creation, which is the aim of my project.

I was attempting to use only JavaScript for most of the melody generation, but on further research, I may have to use Python for the RNN but should still be able to use JavaScript for the VAE model. This is due to the complexities of having to either the lookback or attention models which require complex vector math that isn’t as efficient in JavaScript. 

Bibliography

Dinculescu, M., Engel, J., Roberts, A. (2019) MidiMe: Personalizing a MusicVAE model with user data. [Online]. Google Research. Available at https://research.google/pubs/midime-personalizing-a-musicvae-model-with-user-data/ (Accessed: 10 October 2024)

Waite, E. (2016) Generating Long -Term Structure in Songs and Stories. [Online]. Google TensorFlow. Available at: https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn (Accessed: 12 October 2024)

Leave a Reply

Your email address will not be published. Required fields are marked *