Categories
Creative Development Planning Research

Can you collaborate with an ANT?

Just kidding. But it does bring up the question of how do I collaborate with a non-human entity? Is it even possible? 

Well, from my perspective it is. However, let’s look at what others in the fields of creativity and sociology say.

I’ll start with a sociological perspective. A definition of Actor Network Theory (ANT) is a good place to start so that we can break down how this relates to my Final Major Project (FMP) and what insights working with a non-human collaborator bring up.

Actor Network Theory: Is a theoretical framework within the field of science and technology studies (STS) that examines how social, technical, and material entities (referred to as “actors”) interact to form complex networks that shape and influence outcomes. ANT challenges traditional distinctions between human and non-human agents by treating them symmetrically as participants in these networks.

This definition, while not specifically saying that non-human entities can be collaborators as an overt term, states that the interactions between human and non-human “actors” to influence one another. 

ANT is a contrasting view to Technological Determinism (TD), where the idea that technology develops independently of social change and drives social change (Bimber, 1990). For example, Karl Marx believed that the railway in colonial India changed social hierarchies by introducing new economic activities (ibid.). While TD can look like a good place to start when you take a cursory view of any technology and its impact on how people use it, I believe a more nuanced approach can lead to a better understanding of how we as humans interact with technology, and how we as humans shape technology. To gain a more holistic view of collaboration I’ll bring up Fraser’s “Collaboration bites back” (2022). In this paper Fraser creates a manifesto for collaboration as a tool for change. So, I thought it best to go through her 10-point manifesto and see/explain how working with ST4RT+ achieves her points. 

  1.  Collaboration should not be predictable:
    This is an easy one. While ST4RT+ is based on my melodies and data, it doesn’t create melodies that are 100% what I would do.
  2. Collaboration should not be clean:
    This one is a little more nuanced. I will say that when I was struggling with the outputs of the model at the start of this project, I had to get my hands dirty and get to the point where I started thinking more like a music producer and less like a developer. 
  3. Collaboration should not be safe:
    This whole project was a risk, using technology I’d never used before, and risking that it was going to work has put me in a place where I thought I was going to be lucky to generate anything worthwhile.
  4. Collaboration requires consent:
    Harder to do this with a non-human collaborator, however if the original generation of a set of melodies is objectively awful (all the notes are overlapped and on bar one) then I just regenerate. 
  5. Collaboration requires trust:
    This point is interesting, for me it was about trusting myself and the process. When I was fighting the models output it was because I wasn’t trusting my skills as a music producer. I wanted the model to generate clean melody lines. Trust in myself has really helped to get this project working.
  6. Collaboration requires time, and time (usually) costs money:
    This project has taken time to get working (far more time in the beginning than I anticipated). It has needed experimentation and failure to get to a point where the process and methodology are working.
  7. Collaboration requires vigilance:
    Regardless of a non-human collaborator, this still applies, though it relies more on me to do that work. 
  8. Collaboration is not compulsory:
    Nothing to see here… in this case it was compulsory.
  9. Collaboration is not cool:
    I disagree here. Only because using an ANT framework almost everything is a collaboration even if you aren’t aware of it. 
  10. Collaboration is a tool for change:
    I agree that any collaboration should challenge the status quo. For me the idea of creating an ethical use for AI trained only on the data that I have given it challenges how AI is being used and the data it is trained on. For me this is important and a point of difference with this project.

I think that when I look at Fraser’s 10-point manifesto that this project still works in terms of meeting what she defines as collaboration.

Bibliography

Bimber, B. (1990) Karl Marx and the Three Faces of Technological Determinism, in Social Studies of Science, Vol. 20, No. 2 (May, 1990), pp. 333-352. Available at: https://www.jstor.org/stable/285094 (Accessed: 2 December 2024). 

Fraser, J. (2022) Collaboration bites back. Available at:  https://www.julietfraser.co.uk/app/download/11414030/Collaboration+bites+back.pdf (Accessed: 18 October 2024) 

Categories
Development Research

Weights for an AI model

Weights

Some types of AI models use weights help tune the output that they produce. Simply put, weights are numbers that help to define the internal structure of a machine model. They help to tell the model what we consider to be important when creating an output. The closer you are to 1.0 indicates higher importance for the model, and the closer you get to zero the less important. In ST4RT+ we have Pitch, Step (start of notes), and Duration (end of notes). These work together to help the AI decide what notes to predict based on the input and the training data.

So how do these work to create an output?

Well, for the most part it’s the weights that give a level of control over what the model sees as the “right” predictions.

With the weights for the ST4RT+ model all at “1.0” we have all variables in the model to be of equal weight. Nothing is more important than anything else. So, when we add an input to the model, Pitch, Step, and Duration, are all equally important. This results in what we can see in Figure 1.

Figure 1: All weights are set to “1.0”

With all weights set to “1.0” we can see that because Step, and Duration, are equally as important as Pitch, that there is less variation in the pitches predicted. Also of note, around halfway through the output the Duration of the notes (their length) suddenly truncates. This is due to the Long/Short Term Memory (LSTM) window dropping out. The LSTM is the AI models memory and allows it to remember melodic phrases. I used a shorter window due to the amount of memory usage on Google Colabs. As you increase the LSTM window by a note you increase the processing requirements exponentially (mHelpMe, 2020) the software needs to remember not only the steps previously but also the predictions.

With the weights set as follows: Pitch “0.005”, Step “1.0”, and Duration “1.0” we get what you can see in Figure 2.

Figure 2: Weights set to Pitch “.005”, Step “1.0”, and Duration “1.0”

This creates a more random fluctuation of pitches. The model is told that you can be less worried about the predicted value of the pitches and instead concentrate on the Step and Duration parameters. In this instance, it created a more widely ranging melody, going down as low as note 40 (E2 – E 2nd Octave), while when Pitch was set to “1.0” the lowest note was note 72 (C5 – C 5th Octave). This can give me some control over the harmonic width of the melody.

When we set the weights as follow: Pitch “1.0”, Step “0.005”, and Duration “1.0” we get what you can see in Figure 3.

Figure 3: Weights set to Pitch “1.0”, Step “0.005”, and Duration “1.0”

With the same input as the previous examples, you can see that with the step value being low tells the model to make sure that Pitch and Duration are important. Step, where the start of a Pitch should be, now creates a large overlap between the notes. This is due to a large difference between the start of a note and its end (the Duration). The model correctly makes the end of the note according to the input data, but the start of each note is wildly early, which creates these overlaps. 

Finally, when we set the weights as follows: Pitch “1.0”, Step “1.0”, and Duration “0.005” we get what you can see in Figure 4.

Figure 4: Weights set to Pitch “1.0”, Step “1.0”, and Duration “0.005”

In this result we can see that the Pitch and Step are even. Not too much pitch variation. And, while there is some variation in the length of notes, they tend to get shorter as the LSTM window starts to fill. Because Duration and Step are co-dependent on one another, we can see how they both influence one another. 

How does this help me?

It helps me when I’m looking at any output generated by ST4RT+ and then can make an educated determination on how I can get something more musical from the model by making small modifications to the weights placed on the model. This type of fine tuning isn’t as necessary when you have a larger dataset, but as I’m working with a micro sized dataset, I sometimes need to make small modifications to get the best from the model.

Bibliography

mHelpMe (2020) “LSTM network window size selection and effect”. StackExchange. Available at: https://stats.stackexchange.com/questions/465807/lstm-network-window-size-selection-and-effect (Accessed: 4 February 2025)

Categories
Development

ST4RT+ now in a handy RNN form factor!

I started with a variational autoencoder model but ended up deciding to do the hard work first. So, I decided to create a test of “me” as a small model using only six midi files of various works that I’ve created over the last two years (Figure 1).

Figure 1: Screenshot of the six midi files I used to create the test version of ST4RT+

Using Google Colab I needed to first do my setup and install fluidsynth (a midi synth) and pretty_midi (used to handle midi). From there I then to be able to upload my dataset (Figure 2).

Figure 2: Code to upload the dataset (after removing any old versions if they exist).

I then grab the first file, in this case “Theft By Rhythm (Split)2.mid” as the main file (I will in a later version separate the model from the input, in this instance it was easier and quicker to put the two tasks together. 

From there I extract the notes from my input (Figure 3).

Figure 3: Extracting the notes from my sample file.

I then can visualize the piano roll so I can double check that I’m using the correct midi data (Figure 4).

Figure 4: Piano Roll of the sample midi file.

After this the geek in me decided to look at the distribution of each note variable (Figure 5). This allows me to see the number of times that a pitch occurs in the midi, the step timing (which is the time elapsed from the previous note or start of the track) and the duration of the notes.

Figure 5: Distribution of Pitch, Step, and Duration of the sample midi file.

Next, I create a dataset using all the midi files uploaded previously. With a small dataset of only six files this is super quick. From here we create a dataset in TensorFlow from all the notes that were in the upload files. This is the place where I will need to adjust for my model to get the best results. Using either magic numbers (trial and error for what works best) or use a technique called hyperparameter tuning (https://www.tensorflow.org/tutorials/keras/keras_tuner). Simply put it is a library that helps to pick the optimal set of parameters for the ML model to get the best results rather than using trial and error.

We then train the model (Figure 6). What we are looking for is that the plot goes down (it doesn’t need hit zero, but it does need to flatten out).

Figure 6: The model as it trains, it gets closer to zero and flattens out, this shows that the model training is working.

Finally, we start to generate notes from the model we have just built (Figure 7). We can use the “temperature” parameter from which higher numbers create more chaotic note choices or keep it lower to preserve more of what the training data looks like.

Figure 7: Generating notes from the model.

I then look at the midi file output (Figure 8) and could regenerate it or change the temperature and regenerate. For this generation it looks like the original input file (Figure 4) but with more from the other random midi data that I added to the training dataset. 

Figure 8: Generated midi file from the input against the model.

This was a lot of work to complete, but I can confirm that it works as expected even considering the small dataset. I’ll work on getting a larger set of data together and making a few other modifications to make it easier to use. 

I like how it modified the input, though it isn’t as easy a process to reiterate with the model yet. These future improvements will make this an interesting tool/collaborator.

References

FluidSynth (2024) FluidSynth (Version 2.4.3) [Computer program]. Available at: https://www.fluidsynth.org/(Accessed: 2 October 2024)

Pretty_midi (2023) pretty_midi (Version 0.2.10) [Computer program]. Available at: https://github.com/craffel/pretty-midi (Accessed: 2 October 2024)