Today’s Lesson: An Luu – Pourquoi tu me fous plus des coups?

Standard

I have not been spending enough time on my blog lately, and it’s time for that to change with “Today’s Lesson:” a weekly blog segment in which I review a song or an album that I’ve recently been listening to and find myself interested in. Unlike a traditional review, I won’t be assigning scores or anything. Instead, I focus on writing about music that I think is worth writing about. In some tacit sense, you can assume that these are all recommendations. However, the real purpose of this segment is to use a song as a stepping off point to talk about whatever else is on my mind. We’re too quick to separate music out from other things in life, and this is my own small way of questioning the merit of doing that.

 

Just bask in the glory of this unabashedly 80s pop single by An Luu, a French actress. I don’t know how or why Spotify recommended this track to me, but I sure am grateful they did. I love it. There is absolutely no pretense, posturing, or even showing off. The song sets up a basic, spartan groove as An’s breathy voice floats over top. There is a certain innocence to the vocal delivery, too. A vulnerability. I’ve listened to the song maybe dozens of times over the past week or so. I’m utterly fascinated by it, partially because I challenged myself to understand the song without looking up a translation of the lyrics.

In writing this post, though, I decided to lift that curtain and track down a translation of the lyrics. Despite being molested by Google Translate, the song’s lyrics are immediately understandable: she’s asking a lover why he stopped beating her – maybe he doesn’t love her anymore? That’s heavy stuff, to put it mildly. The scenes are immediately evocative of Lou Reed’s Berlin concept album: domestic, bluntly laying out the cruelty, and also confronting the hard-to-understand inner workings of the abused that stay in the relationship. Though it might just be coincidence, the lover in An’s song is referred to as LouLou…

I won’t lie, I’m kind of bowled over right now. And this is the kind of musical experience I really cherish: drawn into a complicated beauty of a song. Listen to it. It’s just pretty. But it’s simultaneously so baldly ugly too. This is something special because it gives us a slice of life. A bad slice, but a slice none-the-less. Music can contain truths that are lost in mere words and, though disgusting as they may be, we mustn’t run from this. This is us, this is who we are. This is reality.

In our social media, we often find ourselves framing ourselves in a certain light, highlighting the good and erasing the bad. Hiding it away, and only putting your best self forward. But it’s a lie, isn’t it? And even when someone tries to show a more complete picture, we look down our noses at them for sharing drama or just being too sad. Something media scholars study at length is this phenomenon of how we present ourselves and how we act when (we think) someone is watching: we change. We pretend to be something else. An Luu strips away all of that with this song, and it’s arresting in its beauty.

Eno: Music is political

Standard

Recently, an interview with Brian Eno appeared in Pitchfork  about Ambient music, what it means, and where it might be going. As much as I generally find Pitchfork to be annoying with their condescending attitude, this is a great interview done by Philip Sherburne. I debated what I wanted to highlight from the interview, but one quote from Eno seems to be resonating with me the most right now:

You can’t really make apolitical art. We started out talking about ways of composing; ways of composing are political statements. If your concept of how something comes into being goes from God, to composer, conductor, leader of the orchestra, section principals, section sub-principals, rank and file, that’s a picture of society, isn’t it? It’s a belief that things work according to that hierarchy. That’s still how traditional armies work; the church still works like that. Nothing else does, really. We’ve largely abandoned that as an idea of how human affairs work. We have more sophisticated ways of looking at things. [Emphasis mine. – JDS]

Do be sure to read the full interview, linked above.

What ever happened to surround sound music?

Standard

This post is based on a presentation I gave at the inaugural Indiana University Media School Graduate Student Conference.

About 10 or so years ago, it seemed like there was a new game in town: surround sound music! Of course, those of you old enough can recall that this isn’t the first time such a promise was made. But this time, by golly, it’s going to work! And if you believe that, then I have a 3D TV to sell you, too. But surround sound music seems like such a natural evolution, much like 3D TV. But time and again, surround sound music has failed to launch.

Back in my undergrad I took a course where, for one assignment, we had to produce a 5.1 mix of a project one of our peers has recorded. Even while explaining the assignment, the professor seemed doubtful about surround sound music really taking off and this being a relevant skill to build. Sure was fun to play with for the assignment, even if my mix was terrible.

As much as I try to avoid jargon, this post is going to have some. So, before I really dive in here, I’m going to hit you with some definitions:

  • Mono: one channel of audio information. Might be coming out of 1 to 2 or more speakers, but when each speaker is playing the exact same sounds, it’s mono.
  • Stereo: two channels of audio information.
  • Surround: more than 2 channels of audio, where some number of channels are positioned in such a way that the sound that comes out of them is coming from the sides, behind, above, or below the listener.

Also, I think to contextualize my argument properly, I need to give a (painfully!) brief history of recorded music, too.

  • Sheet music: circa 2000 BCE, cuneiform tablets had musical notation on them
  • Mechanical reproduction: circa 9th century (!!!), a hydro-powered organ that performed music etched into interchangeable cylinders by the Banū Mūsā brothers

A diagram of the hydro-powered organ.

  • Phonograph: 1877, Thomas Edison. Wax cylinders that could have audio waveforms etched into them and played back later.
    • Recorded live; recording and playback on one mechanism.
  • Disc phonograph: 1889, Emile Berliner. Platters instead of cylinders.
    • 33 ⅓ rpm (the LP): 1948, Columbia Records
  • (Practical) stereo sound: Bell Labs, 1937
  • Surround first attempted by Disney’s premiere of Fantasia in 1940
  • First “big” consumer format was Quadraphonic in the very early 1970s
    • Actually 3 competing and not cross-compatible formats
    • Could be done on tape or vinyl
    • CDs hypothetically could contain quadraphonic sound and it is allowed (but under-specified) in the “Red Book” but this was never commercially attempted 
  • Once the DVD and home theater setups became largely ubiquitous, DVD-A was attempted (among others)

OK, now to the good stuff!

The Case Against Stereo

It’s kind of hard to imagine given its ubiquity and sort of obvious design, but stereo music was not met with resounding embrace. Perhaps most understandably, the public needed to be convinced that it was more than a mere gimmick. But even musical luminaries like Brian Wilson of the Beach Boys and Phil Spector of convicted murder and the famous “Wall of Sound” production aesthetic spoke out against stereo. Spector thought that stereo would take control away from the producer, and power away from his Wall of Sound. It was an issue of scale: the Wall of Sound didn’t seem to work in stereo.

Wilson’s concerns were similar, but more focused on feeling like stereo necessitated trusting the public to set up their stereo systems correctly. If the speakers weren’t placed right, the stereo image would be strange and the balance between the left and right sides of the music would be bizarre or at the least, transformative to the recording. To contrast with mono systems, you just need to plug it in and turn it on. There’s nothing to calibrate.

To make matters worse, when companies were pushing stereo, they needed to be able to sell stereo records to people. As such, lots of recordings that were designed for mono were reprocessed as stereo. Back to Spector’s concerns, these recordings were not conceptualized for stereo. Even on a well set up stereo system, it is ultimately a perversion of what it was meant to be. Even more damning is that audiences had mixed reactions to “stereoized” recordings.

Surround Sound: more channels = more music?

It was only about 10 or 15 years after the initial foray into stereo music that surround sound first came to the consumer market in the form of quadraphonic sound: four speakers positioned around the perimeter with the listener in the middle. Just think about the physical reality of that for a moment! A few years ago a mono system could just be plunked down wherever convenient. No wires running every which way, sounded pretty good in a large portion of the room, and it was cheap. Then two speakers, but the sweet spot was still pretty large and the wires were limited to one side of the room at least. But then quad? This required an entire room to be dedicated to the listening of music, and you couldn’t stray too far from the sweet spot and have it still sound “good.” Wires would have to run the perimeter of the room, too. And the cost of four speakers and the specialized playback systems. Yes, systems. There were several competing quad formats that were not cross compatible. Yikes. Couple that with the quad-ized recordings and it was a bit of a mess.

All of that aside, there is a certain parity for the mono-to-stereo move and the stereo-to-surround move. But one worked and the other didn’t. Why?

Affordances of the Medium

Every medium is unique: Van Gogh’s Starry Night rendered in watercolor would be a different work because watercolor and oil do different things. The same thing applies to music formats: each has a unique set of strengths and weaknesses. Things tend to be most interesting, it seems to me, when artists leverage these affordances of the medium to create something that only works in that medium. The concerns about surround sound delivery are becoming less and less pronounced, thanks to modern surround emulations on headphones and even home theater soundbars can kind of fake surround sound. But where’s the music?

Starry night as painted by VanGogh

A watercolor re-interpretation.

I think that it has to do, largely, with the fact that not many artists need (or want) a surround sound space to do their work. In the West, our music listening traditions are deeply rooted in musicians being collected together in one area and the audience paying attention to them. (It hasn’t always been this way, but it has been for a few hundred years for the most part.) With our two ears in any physical space, we will hear stereo sound. So between our cultural practices of music and our built-in stereo receiver, stereo music works nicely.

Let’s go back to Spector’s Wall of Sound. The Wall of Sound didn’t scale well to stereo because it was built upon the idea that adding multiple, multiple layers of a single part together, he could create an all-encompassing assault of music. Splitting this into stereo meant he would need to double what were already some of the largest, most complex recording sessions. It just couldn’t be done effectively. Now recall that surround is at least doubled yet again in terms of channels.

What does surround sound music even sound like?

Ever listen to early stereo recordings? You might hear the drums all the way in the left, the bass all the way in the right, and so on. Maybe there would be extra reverb added to fill the space. It was a bit extreme, but it was a necessity. Those sources weren’t recorded to be stereo, so all they could do was put individually mono signals in different spaces in the mix. And by golly, if people are paying for stereo, let’s make sure they hear it! This was also due to limitations of early stereo recording consoles where panning (placing things in the stereo field) was reduced to “L-C-R:” a three way toggle for left, center, or right. But back to surround… what should go in those additional channels?

“This 5.1 mix of Megadeth is so going to be worth it.”

The answer to this question is similar to the answer for early stereo: grabbing elements from recordings conceptualized for stereo and distributing them across the additional channels. The result is an emaciated surround mix, spread thin around the room. Crucial pieces would be excised and hung out in periphery. Even worse, those sounds from aside or behind the listener have very different psychological meanings than sounds from in front of you. On a fundamental and animal level, sounds from sources that we can’t see are startling.

Other approaches were to take a stereo recording and make it sound like you were listening in an idealized listening environment. Some kind of emulation of a space. This is an interesting idea, but there’s no way to account for what the listener’s room already sounds like. Once more, this is ultimately noise. The signal is the music!

#NotAllSurround

I don’t mean to universalize. There are some wonderful examples of surround sound music out there, but it’s very niche. And it’s because it necessitates the entire process of recording the music (if not the conceptualization of the music itself!) to be done, from the ground up, for surround sound. And it’s hard. It’s very, very hard to do because there is so little basis for comparison. Part of successful artistic endeavor is pushing against the boundaries of the possible. In surround sound, those boundaries are so much more distant than stereo or mono that it’s hard to even find them. It’s for these reasons that I think surround sound music will never leave the niche. If the content is good, people will make excuses to jump through the hurdles to listen to it.

A Recommendation

Even though I’ve been dumping on surround sound music, I don’t want you to think that I dislike it or think it’s dumb. Far from it! It’s just hard to find examples of surround sound music that sound like they should be surround or that they are doing something that can only be done in surround. But those examples do exist, and I’d like to recommend one:

The Flaming Lips: Yoshimi Battles The Pink Robots 

I recommend this one in particular because it’s a reasonably well known recording in its own right, but also because the stereo and surround versions allow for a compare and contrast: the ‘Lips didn’t just release a surround version of stereo mixes: they’re different versions of the songs with different elements and different vibes. The Flaming Lips have long played around with surround sound, so it only seems fitting that they knocked this one out of the park. And despite its age, it still sounds like the future – and that’s what surround sound is all about, right?

“Brains?” Do you mind?

Standard

I think it would be a bit conceited to think that anyone reads my blog posts and hangs on every word, but it may or may not have struck you that I tend to avoid mentioning the brain unless I’m talking about the organ. Since I study cognition, I tend to say things like “cognitive system” or “mind.” A lot of people conflate these things, but I try to avoid doing so. This difference may seem pedantic or insignificant, but I would protest that on a few points.

  1. The brain is a piece of meat. It’s certainly a critical portion of the cognitive system and mind, but it is not the whole system. For example, the enteric nervous system is sometimes referred to as “the second brain.” While they may/not be overstating the the neuronal activity by comparing it to our brains, it is undeniably a factor in our cognition.
  2. Beyond the second brain, our bodies play a major role in our cognition. Sensory organs transform raw inputs before they ever reach our brains. Making changes to posture can impact mood. Hell, even holding a pencil in our teeth can change how we feel about things we see and hear. Our brains are part of our bodies and are intimately linked to form our cognition.
  3. Words matter. A concept known as Cartesian Dualism – a separation of the brain (mind!) and body – still permeates our culture today despite overwhelming evidence to the contrary. Even researchers that focus exclusively on the brain when studying cognition, when pressed, would acknowledge that the brain is NOT the mind.

Clearly, I prescribe to the Embodied Cognition paradigm. In a nutshell, it means I doubt if the brain and body can be separated in a meaningful way when talking about cognition. The two are part of the same whole system. The brain controls the body, and the body controls the brain. It’s all interrelated and should be considered holistically.

I’ll freely admit that there’s more than one way to interpret this thing called the mind. But I won’t go around reducing our cognitive systems to “the brain.” That would be like pointing to a tiger and merely calling it a cat: it’s mostly right, but also incredibly misleading in some crucial ways in the wrong context.

I come not to praise the album, but to bury it

Standard

Ok, not really. The album has been and probably will be a viable and vibrant means of artful expression for the foreseeable future. But like the vinyl medium, I think it’s halcyon days are over. All because of this little bastard:

shuffle-icon-614x460

Damn you!

An album is, at least, a loose collection of related songs. Typically the recordings contained within are from one band or artist, from a similar time period, and contain sonic cues to relate the songs to one another. In short, they sound like they belong together. But now if you were to open up your music player of choice, I’d put even money on you having some kind of playlist or aggregate view that contains works from multiple artists, times, genres, and so forth. And you probably (gasp!) mix these together to organize them into ad hoc compilations that suit some purpose or setting.

As a music consumer, being able to make playlists is an endlessly fun and fruitful way to explore connections between my life and music. I find new ways to connect and relate to artists and music when I have free reign to build playlists. Was Friendly Rich‘s “Mr. Skin’s Hymn” meant to be put in conversation with Scott Walker’s “30th Century Man?” I don’t know, but now that I’ve put them on a playlist together, I quite enjoy it.

scottwalker-horzb

Please don’t confuse the Scott Walker’s “30th Century Man” with this Scott Walker, a 17th century man

I’m sure you have your own similar experiences with making playlists. Even the simple act of putting music of a similar tempo together for the purpose of a workout playlist is destructive to the concept of the album. So why, then, this disconnect between the way people listen to music and the way music is released?

Putting on my musician hat, something I’ve grappled with a long time is “should I record an album?” Aside from my legions of adoring fans demanding such a release, why should I? My music listening habits inform my music creation habits. I don’t have the material for an album per se because my collection of recordings is more like a playlist. There is some kind of implicit thread through them all but an album implies a genre, a mood, a production sense… something. And I don’t have it.

empty-theater

Pictured: the wild throngs of fans

I don’t think it’s a matter of discipline, either. This is a pointed effort: I want each song to exist in what I think is its best and truest form. I want to celebrate diverse inspirations. I want my music to reflect the way I listen to music. That means sacrificing the obvious sonic cues that these recordings belong together. And I know I’m not alone. In fact, composers have been playing with this idea for a long time. The likes Stockhausen and Cage challenged us to question what sounds belong together in music. Moving up a level, what songs belong together on an album?

There’s something coming over the horizon – a new way to think about a collection of recordings that belong together – and it isn’t an album as we know it. It’ll be some new way to approach the underlying logic of how and why songs belong together, and what it means for them to exist in one release. I’m excited to find out what it is.

Once More, with (Less) Feeling: artificialized vocals

Standard

 

This semester has been challenging and fun. One class, in particular, really pushed me. It’s a class on music information processing. In other words, it’s a class on how computers interpret and process music as audio. I’ll spare you a lot of the technical stuff, but generally speaking we were treating audio recordings are vectors with each value of the vector corresponding to the amplitude of a sample. This allowed us to do all sorts of silly and interesting things to the audio files.

The culmination of the class is an independent project that utilizes principles learned from the class. This presented a unique opportunity to design an effect that I’ve wanted but couldn’t find: a way to make my voice sound like a machine. Sure, there’s vocoders, pitch quantizers, ring modulators, choruses, and more… but they don’t quite do what I want. The vocoder gets awfully close, but having to speak the vocals and also perform the melody on a keyboard is no fun. iZotope’s VocalSynth actually gets very close to what I want, but even that is hard to blend the real and the artificial. There had to be something different!

And now there is. Before I can explain what I did, here’s a little primer on some stuff:

Every sound we hear can be broken down into a combination of sine waves. Each wave has 3 parameters: frequency (pitch), amplitude (loudness), and phase. You’ll note that phase doesn’t have an everyday analog like frequency does with pitch. That’s probably because our hearing isn’t sensitive to phase (with some exceptions not covered here). Below is a picture of a sine wave.

zxfec

See how the wave starts at the horizontal line that bisects the wave? This sine wave has a phase of 0 degrees. If it started at the peak and went down, it would have a phase of 90 degrees. If it started in the middle and went down, it would have a phase of 180, and so forth.

As I said, we don’t really hear phase, but it’s a crucial part of a sound because multiple sine waves are added together to make complex sounds. Some of them reinforce each other, others cancel each other out. All in all, they have a very complex relationship to each other.

This notion of a complex wave represented by a series of sine waves comes from a guy named Fourier. (He’s French so it’s “Four-E-ay.”) There’s a lot of different flavors of the Fourier Transforms, but the type relevant here is the Finite (or Fast) Fourier Transform. This one only deals with finite numbers, which are very computer friendly.

There’s a subset of the FFT called the STFT (short-time Fourier Transform) that maintains phase information in such a way that it’s easier to play with. One of the simplest tricks is to set all of the phases to 0. This makes a monotone, robotic voice with a few parameters changed. Hm! That’s fun, but not very musical.

STFTs, as the name implies, analyze very short segments of audio then jump forward and analyze another short segment. Short, in this case, means something like 0.023 seconds (1024 samples at 44.1k) of audio at a time. Here’s where the robot voice comes in: instead of jumping ahead to the next unread segment, I’ll tell it to jump ahead, say, a quarter of the way and grab 0.023 seconds, then jump another quarter and so on. This imposes a sort of periodicity to the sound, and periodicity is pitch!

By manipulating the distance I am jumping ahead, I can impose different pitches on the audio. This is essentially what I did in my project. More specifically, I:

  1. Made a sample-accurate score of the desired pitches
  2. Made a bunch of vectors for start time, end time, and desired pitches (expressed as a ratio)
  3. Made a loop to step through these vectors
  4. Grabbed a chunk of sound from a WAV file
  5. Performed an STFT using the pitches I plugged in
  6. Did an inverse STFT to turn it back into a vector with just amplitube values for samples
  7. Turned that back into a WAV file

(See the end of the post for a copy of my code.)

Here’s what I ended up with!

And here’s what it started as:

Please be forgiving of the original version. It’s not great… I was trying to perform in such a way that would make this process easier. It did, but the trade off was a particularly weak vocal performance. Yeesh. My pitch, vowels, and timbre were all over the place!

Anyway, here’s the code. You’ll need R (or R Studio!) and TuneR. Oh, and the solo vocal track.

setWavPlayer("/Library/Audio/playRWave")

stft = function(y,H,N) {
 v = seq(from=0,by=2*pi/N,length=N) 
 win = (1 + cos(v-pi))/2
 cols = floor((length(y)-N)/H) + 1
 stft = matrix(0,N,cols)
 for (t in 1:cols) {
 range = (1+(t-1)*H): ((t-1)*H + N)
 chunk = y[range]
 stft[,t] = fft(chunk*win)
 } 
 stft
}

istft = function(Y,H,N) {
 v = seq(from=0,by=2*pi/N,length=N) 
 win = (1 + cos(v-pi))/2
 y = rep(0,N + H*ncol(Y))
 for (t in 1:ncol(Y)) {
 chunk = fft(Y[,t],inverse=T)/N
 range = (1+(t-1)*H): ((t-1)*H + N)
 y[range] = y[range] + win*Re(chunk)
 }
 y
}

spectrogram = function(y,N) {
 bright = seq(0,1,by=.01) 
 power = .2
 bright = seq(0,1,by=.01)^power
 grey = rgb(bright,bright,bright) # this will be our color palate --- all grey
 frames = floor(length(y)/N) # number of "frames" (like in movie)
 spect = matrix(0,frames,N/2) # initialize frames x N/2 spectrogram matrix to 0
 # N/2 is # of freqs we compute in fft (as usual)
 v = seq(from=0,by=2*pi/N,length=N) # N evenly spaced pts 0 -- 2*pi
 win = (1 + cos(v-pi))/2 # Our Hann window --- could use something else (or nothing)
 for (t in 1:frames) {
 chunk = y[(1+(t-1)*N):(t*N)] # the frame t of audio data
 Y = fft(chunk*win)
 # Y = fft(chunk)
 spect[t,] = Mod(Y[1:(N/2)]) 
 # spect[t,] = log(1+Mod(Y[1:(N/2)])/1000) # log(1 + x/1000) transformation just changes contrast
 }
 image(spect,col=grey) # show the image using the color map given by "grey"
}


library(tuneR) 
N = 1024
w = readWave("VoxRAW.wav")
y = w@left
full_length = length(y)


bits = 16
i = 1
# this is a vector containing all of the pitch change onsets, in samples
start = c(0,131076,141117,152552,241186,272557,292584,329239,402666,
 459154,474012,491649,697317,786623,804970,824932,900086,924171,
 944914,968743,984086,1082743,1088571,1120457,1132371,1151571,
 1335171,1476343,1614943,1643400,1666886,1995600,2133514,2274429,
 2300571,2325686,3332571,3412114,3437400,3451800,3526457,3540343,
 3569314,3581657,3600943,3610371,3681086,3694800,3745200,3763371,
 3990000,4072371,4091143,4113000,4195286,4216200,4233429,4254000,
 4286743,4380771,4407701,4422086,4443686,4630114,4750886,4768029,
 4906371,4934829,4958914,5286171,5409686,5428714,5565943,5595086,
 5618829,5944543,6068829,6086057,6223714,6250543,6275057)

#this is a vector containing all of the last samples necessary for pitch changes. in samples
end = c(131075,141116,152551,241185,272556,292583,329238,402665,459153,
 474011,491648,697316,786622,804969,824931,900085,924170,944913,
 968742,984085,1082742,1088570,1120456,1132370,1151570,1335170,
 1476342,1614942,1643399,1666885,1995599,2133513,2274428,2300570,
 2325685,3332570,3412113,3437399,3451799,3526456,3540342,3569313,
 3581656,3600942,3610370,3681085,3694799,3745199,3763370,3989999,
 4072370,4091142,4112999,4195285,4216199,4233428,4253999,4286742,
 4380770,4407700,4422085,4443685,4630113,4750885,4768028,4906370,
 4934828,4958913,5286170,5409685,5428713,5565942,5595085,5618828,
 5944542,6068828,6086056,6223713,6250542,6275056, full_length)

#this ratio determines the pitch we hear by manipulating the window size
ratio = c(4.18128465,3.725101135,3.318687826,3.725101135,4.693333333,
 4.972413456,4.693333333,3.132424191,4.18128465,3.725101135,
 3.318687826,3.132424191,4.18128465,3.725101135,3.318687826,
 3.725101135,4.693333333,4.972413456,4.693333333,3.725101135,
 3.318687826,4.18128465,4.18128465,3.725101135,3.318687826,
 3.132424191,4.972413456,5.581345393,3.725101135,4.18128465,
 4.972413456,4.972413456,5.581345393,3.725101135,4.18128465,
 4.972413456,4.18128465,3.725101135,3.318687826,3.725101135,
 4.18128465,4.693333333,4.972413456,4.693333333,3.725101135,
 3.318687826,2.486206728,4.18128465,3.725101135,3.132424191,
 4.18128465,3.725101135,3.318687826,3.725101135,4.693333333,
 4.972413456,4.693333333,3.725101135,3.318687826,4.18128465,
 3.725101135,3.318687826,3.132424191,4.972413456,3.725101135,
 5.581345393,3.725101135,4.18128465,4.972413456,4.972413456,
 3.725101135,5.581345393,3.725101135,4.18128465,4.972413456,
 4.972413456,3.725101135,5.581345393,3.725101135,4.18128465,
 4.972413456)

w = readWave("VoxRAW.wav")
sr = w@samp.rate
y = w@left
ans = 0

for (i in 1:81) {
#the loop steps through each of the 3 above vectors 
frame = y[start[i]:end[i]] #take a bit of the wave from start to end
 
H = N/ratio[i] #make the window this size to change the perceived pitch
Y = stft(frame,H,N)


Y = matrix(complex(modulus = Mod(Y), argument = rep(0,length(Y))),nrow(Y),ncol(Y)) # robotization
ybar = istft(Y,H,N)
ans = c(ans,ybar) #concatinate all of the steps along the way

i = i + 1 #step through the loops
}

ans = (2^14)*ans/max(ans) #do some rounding to make sure it all fits
u = Wave(round(ans), samp.rate = sr, bit=bits) # make wave struct
#writeWave(u, "robotvox.wav") #save the robot version
o = readWave("VoxRAW.wav")
o = o@left
spectrogram(o, 1024) #what does the original recording look like?
r = readWave("robotvox.wav")
r = r@left
spectrogram(r, 1024) #what does the robot version look like?
#play(u) #listen to the robot version

MP3s don’t matter (until they do)

Standard

I’ve written before on some of the differences in MP3s vs WAVs, specifically how MP3s seem to invoke more negativity than WAVs in a blind test. I don’t know about you, but I thought those results were interesting and weird. So, I thought it made sense to kind of zoom out and try and get a bigger picture of this phenomenon.

A logical first step was to ask “Can people even hear the difference between WAVs and MP3s in their day-to-day life? If so, in what circumstances?” As the title implies, people generally can’t tell in most circumstances but once they do, it is a very pronounced shift.

The Experiment

I made an online experiment, asking people to listen to 16 different pairs of song segments and select the one they thought sounded better. There were 4 levels of MP3 compression: 320k, 192k, 128k, and 64k.

‘Why those levels of compression?’ you might be wondering. Amazon and Tidal deliver at 320k, Spotify premium does 192k, YouTube does 128k, and Pandora’s free streaming is 64k.

For each pair, one version of the segment was a WAV and the other was an MP3. (See below for more detail.) I also asked basic demographic information and how they usually listen to music and how they were listening to the experiment. For example, a lot of people use Spotify regularly for music listening on their phones, and a lot of people used their phones to do the experiment. Doing the experiment gave up a lot of control over how and where people listened, but the goal was to capture a realistic listening environment.

The Songs

I selected songs that are generally considered to be good recordings capable of offering a kind of audiophile experience. Also, I tried to choose “brighter” sounding recordings because they are particularly susceptible to MP3 artifacts. The thought behind this was to maximize the chance for identification of sonic differences, because I was doubtful there would be any difference until a very high level of compression.

I also split the songs into eras: Pre and Post MP3. I thought that maybe music production techniques might change to accommodate the MP3 medium, and maybe MP3s would be easier to detect in recordings that were not conceived for the medium.

The Song List by Era

Pre MP3 (pre 1993):

  1. David Bowie – Golden Years (1999 remaster)
  2. NIN – Terrible Lie
  3. Cowboy Junkies – Sweet Jane
  4. U2 – With Or Without You
  5. Lou Reed – Underneath the Bottle
  6. Lou Reed & John Cale – Style It Takes
  7. Yes – You and I
  8. Pink Floyd – Time

Post MP3:

  1. Buena Vista Social Club – Chan Chan
  2. Lou Reed – Future Farmers of America
  3. Air – Tropical Disease
  4. David Bowie – Battle for Britain
  5. Squarepusher – Ultravisitor
  6. The Flaming Lips – Race for the Prize
  7. Daft Punk – Giving Life Back to Music
  8. Nick Cave & The Bad Seeds – Jesus Alone

The Song List by Compression Level

320k

  1. Cowboy Junkies – Sweet Jane
  2. Lou Reed – Underneath the Bottle
  3. Squarepusher – Ultravisitor
  4. Daft Punk – Giving Life Back to Music

192k

  1. David Bowie – Golden Years (1999 remaster)
  2. NIN – Terrible Lie
  3. The Flaming Lips – Race for the Prize
  4. Air – Tropical Disease

128k

  1. U2 – With Or Without You
  2. Lou Reed & John Cale – Style It Takes
  3. Buena Vista Social Club – Chan Chan
  4. Nick Cave & The Bad Seeds – Jesus Alone

64k

  1. Pink Floyd – Time
  2. Bowie – Battle for Britain
  3. Lou Reed – Future Farmers of America
  4. Yes – You and I

The Participants

I had a total of 17 participants complete the experiment (and 1 more do part of the listening task) and a whole lot of bogus entries by bots…. sigh. Here’s some info on the real humans that did the experiment:

Pie Charts2.png

Note: options with 0 responses are not shown

Pie Charts3.png

Pie Charts4.png

“Which best describes your favorite way to listen to music that you have regular access to?” was the full question. I didn’t want everyone to think back to that one time they heard a really nice stereo!

Pie Charts5.png

Pie Charts6.png

Pie Charts7.png

“This includes informal or self-taught training. Examples of this include – but are not limited to – musicians, audio engineers, and audiophiles.”

 

Unfortunately, the sample size wasn’t big enough to do any interesting statistical analyses with this demographic info, but it’s still informative to help understand who created this data set.

The Results

Participants reliably (meaning, a statistically significant binomial test) selected WAVs as higher fidelity when the MP3s were 64k. Other than that, there was no statistical difference.

OUTPUT.png

OUTPUT1.png

OUTPUT2.png

OUTPUT3.png

11 to 57 in favor of WAV, p <0.001

When I first looked at the Pre/Post MP3 comparison, I was flummoxed. There is a statistical difference in the Post MP3 category… favoring WAVs.

866

That’s pretty counter-intuitive. That would be like finding that people preferred listening to the Beatles on CD instead of vinyl. It just doesn’t make sense. Why would recordings sound worse in the new hip medium that everyone’s using?

They don’t. My categorization was clumsy. So, yes, I selected 8 songs that were recorded after MP3s were invented, but what I didn’t consider is that the MP3 was not a cultural force until about a decade later, and not a force in the music industry until later than that even. So I went back and looked at just the Post MP3 category and split it again. Figuring out when the MP3 because a major force in the recording industry was a rabbit hole I didn’t want to go down, so I used a proxy: Jonathan Sterne, a scholar who looks at recording technology, published an article in 2006 discussing the MP3 as a cultural artifact. And luckily enough, using 2006 ended up being fruitful because of my 8 songs in the Post MP3 category, none were released on or even near 2006. I had 5 released before and 3 released after, and when I analyzed those groups, there was a strong preference for WAV in the older recordings but not in the newest recordings. This suggests that yes, recordings, after a certain date, are generally recorded to sound just as good as MP3s of a certain quality or WAVs. Here’s the analysis:

better-mp31

25 to 60 in favor of WAV, p < 0.001

 

better-mp3

So, to sum up: the debate between WAV and MP3 doesn’t matter in terms of identifying fidelity differences in real world situations for these participants UNTIL the compression levels are extreme. And, recordings designed for CDs and not MP3s sound better on CDs than MP3s, but it doesn’t matter for older recordings. If I had to guess it could be because some of the limitations of the vinyl medium are similar to MP3 (gasp! Heresy!) and so recordings designed for vinyl work kinda well as MP3s, too.