Biting off more than I can chew: Songification

Standard

So I had a thought: the human voice has been picked over to find sonic commonalities (at least within cultures) of expression. For instance, when someone gets vibrantly angry, they tend to get louder and their muscles tense up all over – including their vocal tract. This translates to a change in how their voice sounds. These changes are relative to a person’s own baseline, but the voice changes predictably.

I had another thought: verbal performance of poetry is awfully close to music. It has a sense of pitch, rhythm, flow, tension, and release. It’s structured differently than most popular music, but the pieces are there.

So then I had an idea: I should build an automated system that could take a recording of any (Western, English-language) meaningful verbal performance of a poem and turn it inso music that reflects or supports the aesthetics and affective nature of the performance. The system would effectively deconstruct not the meaning of the words, but the feeling of the voice performing the poem and turn that into music. Patterns that suggest anger should generate more tense music. Patterns that suggest ennui should generate more detached music. You get the idea.

I’ve spent more time on this than I really should, so I’m going to share my failures with you and move on. Actually, that’s too harsh and I should give myself a little more credit. Besides, who doesn’t like reading a blog post where the author talks about how great they are?

Here’s where I’ve succeeded:

  • making some objectively creepy music (Happy Halloween?)
  • the system does respond reasonably well to building tension
  • recording of a poem goes in, unique music comes out

Here’s where I’ve failed:

  • I wanted the system to be fully automated. I’m not sure if fully automated is possible, but I do believe that nearly fully automated is possible.
  • While the music is unique, there’s too much homogeny between the poems
  • I used a between-poems design, when maybe I should have used a within-poems design.

I think my biggest stumbling block here is that I got so lost in my proof-of-concept stage while using a typical DAW (Logic 9), I never got into using more specialized and arguably more suited tools like MaxMSP. I went through several iterations of the prototype before I decided to cut my losses and move on. Perhaps the problem is that I am using the wrong tools.

So as I mentioned, I wanted this system to be fully automated. With no user input other than the source poem, I wanted meaningful musical output. My system ended up needing massaging, kind of like Brian Eno & Peter Chilver’s Bloom app. I had to set the musical scales used for each poem instead of this choice being driven from data in the sound file. This is a technological constraint of Logic, or at least it is a constraint of my understanding of Logic. I couldn’t figure out how to get plugins to send control data to each other (with a few exceptions). MaxMSP can do this.

The homogeny is a also a product of the system not being able to manipulate itself, or to have the pieces interrelate other than a simple A -> B fashion. Data flowed from one plugin the next, then through a series of busses to be routed and mixed. Some interesting evens still happen, and the music is emotive, but it’s more like baby’s first steps than functioning elegantly. Additionally, I chose poems with speakers that have similar voices. This was a choice I debated but stuck with. I didn’t want to risk the aesthetic responses to be caused by vastly different voices, but by vastly different poems.

What I meant above about a “between-poems” and “within-poems” design is that I’m starting to wonder if I should have done my proof-of-concept on multiple different performances of a single poem. That way I could have learned if the system could generate meaningful output in that setting, before expecting it to work on several different poems. I think I not only dove in the deep end, but I decided to do it at midnight. In a quarry. With a new moon.

Anyway, enough running myself down. Have I got you excited to listen to these recordings yet? Yeesh. The three poems I chose are “Ursonate” by Kurt Scwitters, “Do Not Go Gentle Into That Good Night” by Dylan Thomas, and “May I feel said he” by EE Cummings. Jaap Blonk performed “Ursonate,” Anthony Hopkins performed “Do Not Go Gentle…,” and Tom Hiddleston performed “May I feel said he.”

I envisioned “Ursonate” as my control – the poem contains no words, just mouth sounds. Blonk gives a stirring, bizarre, and emphatic performance. It’s vocal olympics. Amazing stuff! Definite thanks to Norbert Herber for suggesting this poem and performance. In a sort of backwards way, this proves my system works to some degree. The music does not sound like the poem outright, but it has the flow of the energy of Blonk’s performance.

Here’s the version of “Ursonate” I used as input:

And here is the output my system generated:

I chose this Dylan Thomas poem partially because it is a favorite of mine, but also because of its quiet rage and it’s understated climatic moments. Here is Hopkins’ performance of Dylan Thomas:

Here is the output my system generated:

I chose EE Cummings because I feel like his poems lend themselves to being spoken aloud. Also, it is gentle, playful, and sweet. Hiddleston’s performance is muted, much like Hopkins’, but there’s menaing and feeling that I wanted to tease apart. Finally, here is the EE Cummings poem performed by Hiddleston. (Do ignore the video content, please):

And last but not least, my system’s output:

I feel like “Ursonate” demonstrates that this is a responsive system, but perhaps I went too fine-grained for my stimuli and should have instead focused on some broader strokes instead of nuanced performances. More Shatner, and less uh, Shatner. Or maybe this is as far as it’ll ever get. All I know is that I’ll go crazy if I don’t put some distance between myself and this project for a while!