Today’s Lesson: An Luu – Pourquoi tu me fous plus des coups?

Standard

I have not been spending enough time on my blog lately, and it’s time for that to change with “Today’s Lesson:” a weekly blog segment in which I review a song or an album that I’ve recently been listening to and find myself interested in. Unlike a traditional review, I won’t be assigning scores or anything. Instead, I focus on writing about music that I think is worth writing about. In some tacit sense, you can assume that these are all recommendations. However, the real purpose of this segment is to use a song as a stepping off point to talk about whatever else is on my mind. We’re too quick to separate music out from other things in life, and this is my own small way of questioning the merit of doing that.

 

Just bask in the glory of this unabashedly 80s pop single by An Luu, a French actress. I don’t know how or why Spotify recommended this track to me, but I sure am grateful they did. I love it. There is absolutely no pretense, posturing, or even showing off. The song sets up a basic, spartan groove as An’s breathy voice floats over top. There is a certain innocence to the vocal delivery, too. A vulnerability. I’ve listened to the song maybe dozens of times over the past week or so. I’m utterly fascinated by it, partially because I challenged myself to understand the song without looking up a translation of the lyrics.

In writing this post, though, I decided to lift that curtain and track down a translation of the lyrics. Despite being molested by Google Translate, the song’s lyrics are immediately understandable: she’s asking a lover why he stopped beating her – maybe he doesn’t love her anymore? That’s heavy stuff, to put it mildly. The scenes are immediately evocative of Lou Reed’s Berlin concept album: domestic, bluntly laying out the cruelty, and also confronting the hard-to-understand inner workings of the abused that stay in the relationship. Though it might just be coincidence, the lover in An’s song is referred to as LouLou…

I won’t lie, I’m kind of bowled over right now. And this is the kind of musical experience I really cherish: drawn into a complicated beauty of a song. Listen to it. It’s just pretty. But it’s simultaneously so baldly ugly too. This is something special because it gives us a slice of life. A bad slice, but a slice none-the-less. Music can contain truths that are lost in mere words and, though disgusting as they may be, we mustn’t run from this. This is us, this is who we are. This is reality.

In our social media, we often find ourselves framing ourselves in a certain light, highlighting the good and erasing the bad. Hiding it away, and only putting your best self forward. But it’s a lie, isn’t it? And even when someone tries to show a more complete picture, we look down our noses at them for sharing drama or just being too sad. Something media scholars study at length is this phenomenon of how we present ourselves and how we act when (we think) someone is watching: we change. We pretend to be something else. An Luu strips away all of that with this song, and it’s arresting in its beauty.

Eno: Music is political

Standard

Recently, an interview with Brian Eno appeared in Pitchfork  about Ambient music, what it means, and where it might be going. As much as I generally find Pitchfork to be annoying with their condescending attitude, this is a great interview done by Philip Sherburne. I debated what I wanted to highlight from the interview, but one quote from Eno seems to be resonating with me the most right now:

You can’t really make apolitical art. We started out talking about ways of composing; ways of composing are political statements. If your concept of how something comes into being goes from God, to composer, conductor, leader of the orchestra, section principals, section sub-principals, rank and file, that’s a picture of society, isn’t it? It’s a belief that things work according to that hierarchy. That’s still how traditional armies work; the church still works like that. Nothing else does, really. We’ve largely abandoned that as an idea of how human affairs work. We have more sophisticated ways of looking at things. [Emphasis mine. – JDS]

Do be sure to read the full interview, linked above.

What ever happened to surround sound music?

Standard

This post is based on a presentation I gave at the inaugural Indiana University Media School Graduate Student Conference.

About 10 or so years ago, it seemed like there was a new game in town: surround sound music! Of course, those of you old enough can recall that this isn’t the first time such a promise was made. But this time, by golly, it’s going to work! And if you believe that, then I have a 3D TV to sell you, too. But surround sound music seems like such a natural evolution, much like 3D TV. But time and again, surround sound music has failed to launch.

Back in my undergrad I took a course where, for one assignment, we had to produce a 5.1 mix of a project one of our peers has recorded. Even while explaining the assignment, the professor seemed doubtful about surround sound music really taking off and this being a relevant skill to build. Sure was fun to play with for the assignment, even if my mix was terrible.

As much as I try to avoid jargon, this post is going to have some. So, before I really dive in here, I’m going to hit you with some definitions:

  • Mono: one channel of audio information. Might be coming out of 1 to 2 or more speakers, but when each speaker is playing the exact same sounds, it’s mono.
  • Stereo: two channels of audio information.
  • Surround: more than 2 channels of audio, where some number of channels are positioned in such a way that the sound that comes out of them is coming from the sides, behind, above, or below the listener.

Also, I think to contextualize my argument properly, I need to give a (painfully!) brief history of recorded music, too.

  • Sheet music: circa 2000 BCE, cuneiform tablets had musical notation on them
  • Mechanical reproduction: circa 9th century (!!!), a hydro-powered organ that performed music etched into interchangeable cylinders by the Banū Mūsā brothers

A diagram of the hydro-powered organ.

  • Phonograph: 1877, Thomas Edison. Wax cylinders that could have audio waveforms etched into them and played back later.
    • Recorded live; recording and playback on one mechanism.
  • Disc phonograph: 1889, Emile Berliner. Platters instead of cylinders.
    • 33 ⅓ rpm (the LP): 1948, Columbia Records
  • (Practical) stereo sound: Bell Labs, 1937
  • Surround first attempted by Disney’s premiere of Fantasia in 1940
  • First “big” consumer format was Quadraphonic in the very early 1970s
    • Actually 3 competing and not cross-compatible formats
    • Could be done on tape or vinyl
    • CDs hypothetically could contain quadraphonic sound and it is allowed (but under-specified) in the “Red Book” but this was never commercially attempted 
  • Once the DVD and home theater setups became largely ubiquitous, DVD-A was attempted (among others)

OK, now to the good stuff!

The Case Against Stereo

It’s kind of hard to imagine given its ubiquity and sort of obvious design, but stereo music was not met with resounding embrace. Perhaps most understandably, the public needed to be convinced that it was more than a mere gimmick. But even musical luminaries like Brian Wilson of the Beach Boys and Phil Spector of convicted murder and the famous “Wall of Sound” production aesthetic spoke out against stereo. Spector thought that stereo would take control away from the producer, and power away from his Wall of Sound. It was an issue of scale: the Wall of Sound didn’t seem to work in stereo.

Wilson’s concerns were similar, but more focused on feeling like stereo necessitated trusting the public to set up their stereo systems correctly. If the speakers weren’t placed right, the stereo image would be strange and the balance between the left and right sides of the music would be bizarre or at the least, transformative to the recording. To contrast with mono systems, you just need to plug it in and turn it on. There’s nothing to calibrate.

To make matters worse, when companies were pushing stereo, they needed to be able to sell stereo records to people. As such, lots of recordings that were designed for mono were reprocessed as stereo. Back to Spector’s concerns, these recordings were not conceptualized for stereo. Even on a well set up stereo system, it is ultimately a perversion of what it was meant to be. Even more damning is that audiences had mixed reactions to “stereoized” recordings.

Surround Sound: more channels = more music?

It was only about 10 or 15 years after the initial foray into stereo music that surround sound first came to the consumer market in the form of quadraphonic sound: four speakers positioned around the perimeter with the listener in the middle. Just think about the physical reality of that for a moment! A few years ago a mono system could just be plunked down wherever convenient. No wires running every which way, sounded pretty good in a large portion of the room, and it was cheap. Then two speakers, but the sweet spot was still pretty large and the wires were limited to one side of the room at least. But then quad? This required an entire room to be dedicated to the listening of music, and you couldn’t stray too far from the sweet spot and have it still sound “good.” Wires would have to run the perimeter of the room, too. And the cost of four speakers and the specialized playback systems. Yes, systems. There were several competing quad formats that were not cross compatible. Yikes. Couple that with the quad-ized recordings and it was a bit of a mess.

All of that aside, there is a certain parity for the mono-to-stereo move and the stereo-to-surround move. But one worked and the other didn’t. Why?

Affordances of the Medium

Every medium is unique: Van Gogh’s Starry Night rendered in watercolor would be a different work because watercolor and oil do different things. The same thing applies to music formats: each has a unique set of strengths and weaknesses. Things tend to be most interesting, it seems to me, when artists leverage these affordances of the medium to create something that only works in that medium. The concerns about surround sound delivery are becoming less and less pronounced, thanks to modern surround emulations on headphones and even home theater soundbars can kind of fake surround sound. But where’s the music?

Starry night as painted by VanGogh

A watercolor re-interpretation.

I think that it has to do, largely, with the fact that not many artists need (or want) a surround sound space to do their work. In the West, our music listening traditions are deeply rooted in musicians being collected together in one area and the audience paying attention to them. (It hasn’t always been this way, but it has been for a few hundred years for the most part.) With our two ears in any physical space, we will hear stereo sound. So between our cultural practices of music and our built-in stereo receiver, stereo music works nicely.

Let’s go back to Spector’s Wall of Sound. The Wall of Sound didn’t scale well to stereo because it was built upon the idea that adding multiple, multiple layers of a single part together, he could create an all-encompassing assault of music. Splitting this into stereo meant he would need to double what were already some of the largest, most complex recording sessions. It just couldn’t be done effectively. Now recall that surround is at least doubled yet again in terms of channels.

What does surround sound music even sound like?

Ever listen to early stereo recordings? You might hear the drums all the way in the left, the bass all the way in the right, and so on. Maybe there would be extra reverb added to fill the space. It was a bit extreme, but it was a necessity. Those sources weren’t recorded to be stereo, so all they could do was put individually mono signals in different spaces in the mix. And by golly, if people are paying for stereo, let’s make sure they hear it! This was also due to limitations of early stereo recording consoles where panning (placing things in the stereo field) was reduced to “L-C-R:” a three way toggle for left, center, or right. But back to surround… what should go in those additional channels?

“This 5.1 mix of Megadeth is so going to be worth it.”

The answer to this question is similar to the answer for early stereo: grabbing elements from recordings conceptualized for stereo and distributing them across the additional channels. The result is an emaciated surround mix, spread thin around the room. Crucial pieces would be excised and hung out in periphery. Even worse, those sounds from aside or behind the listener have very different psychological meanings than sounds from in front of you. On a fundamental and animal level, sounds from sources that we can’t see are startling.

Other approaches were to take a stereo recording and make it sound like you were listening in an idealized listening environment. Some kind of emulation of a space. This is an interesting idea, but there’s no way to account for what the listener’s room already sounds like. Once more, this is ultimately noise. The signal is the music!

#NotAllSurround

I don’t mean to universalize. There are some wonderful examples of surround sound music out there, but it’s very niche. And it’s because it necessitates the entire process of recording the music (if not the conceptualization of the music itself!) to be done, from the ground up, for surround sound. And it’s hard. It’s very, very hard to do because there is so little basis for comparison. Part of successful artistic endeavor is pushing against the boundaries of the possible. In surround sound, those boundaries are so much more distant than stereo or mono that it’s hard to even find them. It’s for these reasons that I think surround sound music will never leave the niche. If the content is good, people will make excuses to jump through the hurdles to listen to it.

A Recommendation

Even though I’ve been dumping on surround sound music, I don’t want you to think that I dislike it or think it’s dumb. Far from it! It’s just hard to find examples of surround sound music that sound like they should be surround or that they are doing something that can only be done in surround. But those examples do exist, and I’d like to recommend one:

The Flaming Lips: Yoshimi Battles The Pink Robots 

I recommend this one in particular because it’s a reasonably well known recording in its own right, but also because the stereo and surround versions allow for a compare and contrast: the ‘Lips didn’t just release a surround version of stereo mixes: they’re different versions of the songs with different elements and different vibes. The Flaming Lips have long played around with surround sound, so it only seems fitting that they knocked this one out of the park. And despite its age, it still sounds like the future – and that’s what surround sound is all about, right?

I come not to praise the album, but to bury it

Standard

Ok, not really. The album has been and probably will be a viable and vibrant means of artful expression for the foreseeable future. But like the vinyl medium, I think it’s halcyon days are over. All because of this little bastard:

shuffle-icon-614x460

Damn you!

An album is, at least, a loose collection of related songs. Typically the recordings contained within are from one band or artist, from a similar time period, and contain sonic cues to relate the songs to one another. In short, they sound like they belong together. But now if you were to open up your music player of choice, I’d put even money on you having some kind of playlist or aggregate view that contains works from multiple artists, times, genres, and so forth. And you probably (gasp!) mix these together to organize them into ad hoc compilations that suit some purpose or setting.

As a music consumer, being able to make playlists is an endlessly fun and fruitful way to explore connections between my life and music. I find new ways to connect and relate to artists and music when I have free reign to build playlists. Was Friendly Rich‘s “Mr. Skin’s Hymn” meant to be put in conversation with Scott Walker’s “30th Century Man?” I don’t know, but now that I’ve put them on a playlist together, I quite enjoy it.

scottwalker-horzb

Please don’t confuse the Scott Walker’s “30th Century Man” with this Scott Walker, a 17th century man

I’m sure you have your own similar experiences with making playlists. Even the simple act of putting music of a similar tempo together for the purpose of a workout playlist is destructive to the concept of the album. So why, then, this disconnect between the way people listen to music and the way music is released?

Putting on my musician hat, something I’ve grappled with a long time is “should I record an album?” Aside from my legions of adoring fans demanding such a release, why should I? My music listening habits inform my music creation habits. I don’t have the material for an album per se because my collection of recordings is more like a playlist. There is some kind of implicit thread through them all but an album implies a genre, a mood, a production sense… something. And I don’t have it.

empty-theater

Pictured: the wild throngs of fans

I don’t think it’s a matter of discipline, either. This is a pointed effort: I want each song to exist in what I think is its best and truest form. I want to celebrate diverse inspirations. I want my music to reflect the way I listen to music. That means sacrificing the obvious sonic cues that these recordings belong together. And I know I’m not alone. In fact, composers have been playing with this idea for a long time. The likes Stockhausen and Cage challenged us to question what sounds belong together in music. Moving up a level, what songs belong together on an album?

There’s something coming over the horizon – a new way to think about a collection of recordings that belong together – and it isn’t an album as we know it. It’ll be some new way to approach the underlying logic of how and why songs belong together, and what it means for them to exist in one release. I’m excited to find out what it is.

Once More, with (Less) Feeling: artificialized vocals

Standard

 

This semester has been challenging and fun. One class, in particular, really pushed me. It’s a class on music information processing. In other words, it’s a class on how computers interpret and process music as audio. I’ll spare you a lot of the technical stuff, but generally speaking we were treating audio recordings are vectors with each value of the vector corresponding to the amplitude of a sample. This allowed us to do all sorts of silly and interesting things to the audio files.

The culmination of the class is an independent project that utilizes principles learned from the class. This presented a unique opportunity to design an effect that I’ve wanted but couldn’t find: a way to make my voice sound like a machine. Sure, there’s vocoders, pitch quantizers, ring modulators, choruses, and more… but they don’t quite do what I want. The vocoder gets awfully close, but having to speak the vocals and also perform the melody on a keyboard is no fun. iZotope’s VocalSynth actually gets very close to what I want, but even that is hard to blend the real and the artificial. There had to be something different!

And now there is. Before I can explain what I did, here’s a little primer on some stuff:

Every sound we hear can be broken down into a combination of sine waves. Each wave has 3 parameters: frequency (pitch), amplitude (loudness), and phase. You’ll note that phase doesn’t have an everyday analog like frequency does with pitch. That’s probably because our hearing isn’t sensitive to phase (with some exceptions not covered here). Below is a picture of a sine wave.

zxfec

See how the wave starts at the horizontal line that bisects the wave? This sine wave has a phase of 0 degrees. If it started at the peak and went down, it would have a phase of 90 degrees. If it started in the middle and went down, it would have a phase of 180, and so forth.

As I said, we don’t really hear phase, but it’s a crucial part of a sound because multiple sine waves are added together to make complex sounds. Some of them reinforce each other, others cancel each other out. All in all, they have a very complex relationship to each other.

This notion of a complex wave represented by a series of sine waves comes from a guy named Fourier. (He’s French so it’s “Four-E-ay.”) There’s a lot of different flavors of the Fourier Transforms, but the type relevant here is the Finite (or Fast) Fourier Transform. This one only deals with finite numbers, which are very computer friendly.

There’s a subset of the FFT called the STFT (short-time Fourier Transform) that maintains phase information in such a way that it’s easier to play with. One of the simplest tricks is to set all of the phases to 0. This makes a monotone, robotic voice with a few parameters changed. Hm! That’s fun, but not very musical.

STFTs, as the name implies, analyze very short segments of audio then jump forward and analyze another short segment. Short, in this case, means something like 0.023 seconds (1024 samples at 44.1k) of audio at a time. Here’s where the robot voice comes in: instead of jumping ahead to the next unread segment, I’ll tell it to jump ahead, say, a quarter of the way and grab 0.023 seconds, then jump another quarter and so on. This imposes a sort of periodicity to the sound, and periodicity is pitch!

By manipulating the distance I am jumping ahead, I can impose different pitches on the audio. This is essentially what I did in my project. More specifically, I:

  1. Made a sample-accurate score of the desired pitches
  2. Made a bunch of vectors for start time, end time, and desired pitches (expressed as a ratio)
  3. Made a loop to step through these vectors
  4. Grabbed a chunk of sound from a WAV file
  5. Performed an STFT using the pitches I plugged in
  6. Did an inverse STFT to turn it back into a vector with just amplitube values for samples
  7. Turned that back into a WAV file

(See the end of the post for a copy of my code.)

Here’s what I ended up with!

And here’s what it started as:

Please be forgiving of the original version. It’s not great… I was trying to perform in such a way that would make this process easier. It did, but the trade off was a particularly weak vocal performance. Yeesh. My pitch, vowels, and timbre were all over the place!

Anyway, here’s the code. You’ll need R (or R Studio!) and TuneR. Oh, and the solo vocal track.

setWavPlayer("/Library/Audio/playRWave")

stft = function(y,H,N) {
 v = seq(from=0,by=2*pi/N,length=N) 
 win = (1 + cos(v-pi))/2
 cols = floor((length(y)-N)/H) + 1
 stft = matrix(0,N,cols)
 for (t in 1:cols) {
 range = (1+(t-1)*H): ((t-1)*H + N)
 chunk = y[range]
 stft[,t] = fft(chunk*win)
 } 
 stft
}

istft = function(Y,H,N) {
 v = seq(from=0,by=2*pi/N,length=N) 
 win = (1 + cos(v-pi))/2
 y = rep(0,N + H*ncol(Y))
 for (t in 1:ncol(Y)) {
 chunk = fft(Y[,t],inverse=T)/N
 range = (1+(t-1)*H): ((t-1)*H + N)
 y[range] = y[range] + win*Re(chunk)
 }
 y
}

spectrogram = function(y,N) {
 bright = seq(0,1,by=.01) 
 power = .2
 bright = seq(0,1,by=.01)^power
 grey = rgb(bright,bright,bright) # this will be our color palate --- all grey
 frames = floor(length(y)/N) # number of "frames" (like in movie)
 spect = matrix(0,frames,N/2) # initialize frames x N/2 spectrogram matrix to 0
 # N/2 is # of freqs we compute in fft (as usual)
 v = seq(from=0,by=2*pi/N,length=N) # N evenly spaced pts 0 -- 2*pi
 win = (1 + cos(v-pi))/2 # Our Hann window --- could use something else (or nothing)
 for (t in 1:frames) {
 chunk = y[(1+(t-1)*N):(t*N)] # the frame t of audio data
 Y = fft(chunk*win)
 # Y = fft(chunk)
 spect[t,] = Mod(Y[1:(N/2)]) 
 # spect[t,] = log(1+Mod(Y[1:(N/2)])/1000) # log(1 + x/1000) transformation just changes contrast
 }
 image(spect,col=grey) # show the image using the color map given by "grey"
}


library(tuneR) 
N = 1024
w = readWave("VoxRAW.wav")
y = w@left
full_length = length(y)


bits = 16
i = 1
# this is a vector containing all of the pitch change onsets, in samples
start = c(0,131076,141117,152552,241186,272557,292584,329239,402666,
 459154,474012,491649,697317,786623,804970,824932,900086,924171,
 944914,968743,984086,1082743,1088571,1120457,1132371,1151571,
 1335171,1476343,1614943,1643400,1666886,1995600,2133514,2274429,
 2300571,2325686,3332571,3412114,3437400,3451800,3526457,3540343,
 3569314,3581657,3600943,3610371,3681086,3694800,3745200,3763371,
 3990000,4072371,4091143,4113000,4195286,4216200,4233429,4254000,
 4286743,4380771,4407701,4422086,4443686,4630114,4750886,4768029,
 4906371,4934829,4958914,5286171,5409686,5428714,5565943,5595086,
 5618829,5944543,6068829,6086057,6223714,6250543,6275057)

#this is a vector containing all of the last samples necessary for pitch changes. in samples
end = c(131075,141116,152551,241185,272556,292583,329238,402665,459153,
 474011,491648,697316,786622,804969,824931,900085,924170,944913,
 968742,984085,1082742,1088570,1120456,1132370,1151570,1335170,
 1476342,1614942,1643399,1666885,1995599,2133513,2274428,2300570,
 2325685,3332570,3412113,3437399,3451799,3526456,3540342,3569313,
 3581656,3600942,3610370,3681085,3694799,3745199,3763370,3989999,
 4072370,4091142,4112999,4195285,4216199,4233428,4253999,4286742,
 4380770,4407700,4422085,4443685,4630113,4750885,4768028,4906370,
 4934828,4958913,5286170,5409685,5428713,5565942,5595085,5618828,
 5944542,6068828,6086056,6223713,6250542,6275056, full_length)

#this ratio determines the pitch we hear by manipulating the window size
ratio = c(4.18128465,3.725101135,3.318687826,3.725101135,4.693333333,
 4.972413456,4.693333333,3.132424191,4.18128465,3.725101135,
 3.318687826,3.132424191,4.18128465,3.725101135,3.318687826,
 3.725101135,4.693333333,4.972413456,4.693333333,3.725101135,
 3.318687826,4.18128465,4.18128465,3.725101135,3.318687826,
 3.132424191,4.972413456,5.581345393,3.725101135,4.18128465,
 4.972413456,4.972413456,5.581345393,3.725101135,4.18128465,
 4.972413456,4.18128465,3.725101135,3.318687826,3.725101135,
 4.18128465,4.693333333,4.972413456,4.693333333,3.725101135,
 3.318687826,2.486206728,4.18128465,3.725101135,3.132424191,
 4.18128465,3.725101135,3.318687826,3.725101135,4.693333333,
 4.972413456,4.693333333,3.725101135,3.318687826,4.18128465,
 3.725101135,3.318687826,3.132424191,4.972413456,3.725101135,
 5.581345393,3.725101135,4.18128465,4.972413456,4.972413456,
 3.725101135,5.581345393,3.725101135,4.18128465,4.972413456,
 4.972413456,3.725101135,5.581345393,3.725101135,4.18128465,
 4.972413456)

w = readWave("VoxRAW.wav")
sr = w@samp.rate
y = w@left
ans = 0

for (i in 1:81) {
#the loop steps through each of the 3 above vectors 
frame = y[start[i]:end[i]] #take a bit of the wave from start to end
 
H = N/ratio[i] #make the window this size to change the perceived pitch
Y = stft(frame,H,N)


Y = matrix(complex(modulus = Mod(Y), argument = rep(0,length(Y))),nrow(Y),ncol(Y)) # robotization
ybar = istft(Y,H,N)
ans = c(ans,ybar) #concatinate all of the steps along the way

i = i + 1 #step through the loops
}

ans = (2^14)*ans/max(ans) #do some rounding to make sure it all fits
u = Wave(round(ans), samp.rate = sr, bit=bits) # make wave struct
#writeWave(u, "robotvox.wav") #save the robot version
o = readWave("VoxRAW.wav")
o = o@left
spectrogram(o, 1024) #what does the original recording look like?
r = readWave("robotvox.wav")
r = r@left
spectrogram(r, 1024) #what does the robot version look like?
#play(u) #listen to the robot version

MP3s don’t matter (until they do)

Standard

I’ve written before on some of the differences in MP3s vs WAVs, specifically how MP3s seem to invoke more negativity than WAVs in a blind test. I don’t know about you, but I thought those results were interesting and weird. So, I thought it made sense to kind of zoom out and try and get a bigger picture of this phenomenon.

A logical first step was to ask “Can people even hear the difference between WAVs and MP3s in their day-to-day life? If so, in what circumstances?” As the title implies, people generally can’t tell in most circumstances but once they do, it is a very pronounced shift.

The Experiment

I made an online experiment, asking people to listen to 16 different pairs of song segments and select the one they thought sounded better. There were 4 levels of MP3 compression: 320k, 192k, 128k, and 64k.

‘Why those levels of compression?’ you might be wondering. Amazon and Tidal deliver at 320k, Spotify premium does 192k, YouTube does 128k, and Pandora’s free streaming is 64k.

For each pair, one version of the segment was a WAV and the other was an MP3. (See below for more detail.) I also asked basic demographic information and how they usually listen to music and how they were listening to the experiment. For example, a lot of people use Spotify regularly for music listening on their phones, and a lot of people used their phones to do the experiment. Doing the experiment gave up a lot of control over how and where people listened, but the goal was to capture a realistic listening environment.

The Songs

I selected songs that are generally considered to be good recordings capable of offering a kind of audiophile experience. Also, I tried to choose “brighter” sounding recordings because they are particularly susceptible to MP3 artifacts. The thought behind this was to maximize the chance for identification of sonic differences, because I was doubtful there would be any difference until a very high level of compression.

I also split the songs into eras: Pre and Post MP3. I thought that maybe music production techniques might change to accommodate the MP3 medium, and maybe MP3s would be easier to detect in recordings that were not conceived for the medium.

The Song List by Era

Pre MP3 (pre 1993):

  1. David Bowie – Golden Years (1999 remaster)
  2. NIN – Terrible Lie
  3. Cowboy Junkies – Sweet Jane
  4. U2 – With Or Without You
  5. Lou Reed – Underneath the Bottle
  6. Lou Reed & John Cale – Style It Takes
  7. Yes – You and I
  8. Pink Floyd – Time

Post MP3:

  1. Buena Vista Social Club – Chan Chan
  2. Lou Reed – Future Farmers of America
  3. Air – Tropical Disease
  4. David Bowie – Battle for Britain
  5. Squarepusher – Ultravisitor
  6. The Flaming Lips – Race for the Prize
  7. Daft Punk – Giving Life Back to Music
  8. Nick Cave & The Bad Seeds – Jesus Alone

The Song List by Compression Level

320k

  1. Cowboy Junkies – Sweet Jane
  2. Lou Reed – Underneath the Bottle
  3. Squarepusher – Ultravisitor
  4. Daft Punk – Giving Life Back to Music

192k

  1. David Bowie – Golden Years (1999 remaster)
  2. NIN – Terrible Lie
  3. The Flaming Lips – Race for the Prize
  4. Air – Tropical Disease

128k

  1. U2 – With Or Without You
  2. Lou Reed & John Cale – Style It Takes
  3. Buena Vista Social Club – Chan Chan
  4. Nick Cave & The Bad Seeds – Jesus Alone

64k

  1. Pink Floyd – Time
  2. Bowie – Battle for Britain
  3. Lou Reed – Future Farmers of America
  4. Yes – You and I

The Participants

I had a total of 17 participants complete the experiment (and 1 more do part of the listening task) and a whole lot of bogus entries by bots…. sigh. Here’s some info on the real humans that did the experiment:

Pie Charts2.png

Note: options with 0 responses are not shown

Pie Charts3.png

Pie Charts4.png

“Which best describes your favorite way to listen to music that you have regular access to?” was the full question. I didn’t want everyone to think back to that one time they heard a really nice stereo!

Pie Charts5.png

Pie Charts6.png

Pie Charts7.png

“This includes informal or self-taught training. Examples of this include – but are not limited to – musicians, audio engineers, and audiophiles.”

 

Unfortunately, the sample size wasn’t big enough to do any interesting statistical analyses with this demographic info, but it’s still informative to help understand who created this data set.

The Results

Participants reliably (meaning, a statistically significant binomial test) selected WAVs as higher fidelity when the MP3s were 64k. Other than that, there was no statistical difference.

OUTPUT.png

OUTPUT1.png

OUTPUT2.png

OUTPUT3.png

11 to 57 in favor of WAV, p <0.001

When I first looked at the Pre/Post MP3 comparison, I was flummoxed. There is a statistical difference in the Post MP3 category… favoring WAVs.

866

That’s pretty counter-intuitive. That would be like finding that people preferred listening to the Beatles on CD instead of vinyl. It just doesn’t make sense. Why would recordings sound worse in the new hip medium that everyone’s using?

They don’t. My categorization was clumsy. So, yes, I selected 8 songs that were recorded after MP3s were invented, but what I didn’t consider is that the MP3 was not a cultural force until about a decade later, and not a force in the music industry until later than that even. So I went back and looked at just the Post MP3 category and split it again. Figuring out when the MP3 because a major force in the recording industry was a rabbit hole I didn’t want to go down, so I used a proxy: Jonathan Sterne, a scholar who looks at recording technology, published an article in 2006 discussing the MP3 as a cultural artifact. And luckily enough, using 2006 ended up being fruitful because of my 8 songs in the Post MP3 category, none were released on or even near 2006. I had 5 released before and 3 released after, and when I analyzed those groups, there was a strong preference for WAV in the older recordings but not in the newest recordings. This suggests that yes, recordings, after a certain date, are generally recorded to sound just as good as MP3s of a certain quality or WAVs. Here’s the analysis:

better-mp31

25 to 60 in favor of WAV, p < 0.001

 

better-mp3

So, to sum up: the debate between WAV and MP3 doesn’t matter in terms of identifying fidelity differences in real world situations for these participants UNTIL the compression levels are extreme. And, recordings designed for CDs and not MP3s sound better on CDs than MP3s, but it doesn’t matter for older recordings. If I had to guess it could be because some of the limitations of the vinyl medium are similar to MP3 (gasp! Heresy!) and so recordings designed for vinyl work kinda well as MP3s, too.

Let’s define music!

Standard

Goodness, I have written lots of word about music, but I’m not sure if I have ever thoroughly defined what I mean by “music.” In this post you’ll find my definition, of course, but I want to clarify right up front that this may read to be slightly antagonistic. In a sense it is meant to be, but ultimately it is about how to define music in the context of communication. I’m trying to push boundaries, not hurt feelings.

I don’t claim all of these thoughts as my own, but this may be a unique synthesis of standing ideas. I’ve also touched on some of these ideas in previous posts, but I wanted to put them all together.

Music describes a way of thinking about sound.

Music is a bit like the infamous Supreme Court ruling on pornography: it’s hard to define but when you’re presented with an example, you recognize it immediately. Once you start leaving the very obvious examples, it gets kind of hard to find the boundary between music and regular sound. That’s because music describes a way of thinking about sound, not a specific kind of sound.

I think the most famous example of pushing the boundaries of music in the western world might be John Cage’s 4’33.” A pianist sits down, prepares to play, then does nothing for 4 minutes and 33 seconds. Is that music? Well, Cage would certainly say so but the audience in the music hall is split. Some say yes, some say no. Who is right?

I would argue that 4’33” in that example is definitively music, and here is why: the context. In his autobiography, Frank Zappa argued that context is key. He called it “putting a frame around it.” Let’s explore this a bit. The audience in my example above is at a music hall to hear music. A performer sits at an instrument, prepares to play, then plays silence for 4’33”. While it is certainly up to audience members to decide how much they enjoy the performance, they can’t really argue about whether or not music happened because the context clearly articulated that music happened.

Here’s another example: you’re walking in the woods alone, and you come to a clearing to find a pianist sitting at a piano. As you approach, she hops up and says “ah! I just finished my performance of 4’33”! What did you think?” Did you hear music for the last 4 minutes and 33 seconds? I don’t think so. There was no contextual clue to encourage you to think about sounds as music for the previous four and a half minutes. (Unless, of course, you just so happened to be doing it on your own free will, but the odds of that are remote.)

Another way to think about it is the old paradox: don’t think about an elephant. It’s impossible to not think about an elephant when you are given this prompt. Similarly, the people in the music hall are thinking about music and thinking about sound as music. Even if they’re thinking “ugh, this is stupid, this isn’t music,” they are still thinking about sound as music.

Music is communication.

When we hear sound as music, we are interpreting and processing it. Music is inherently more vague in its meaning than language, but there is still meaning. Music has emotional impacts, triggers memories, and causes physiological responses. Language does all of these things, too.

I think a lot of people get hung up on the idea of “music is communication” because music isn’t specific or declarative. I agree wholly that music is non-specific and non-declarative. I can’t play you a tune on a recorder to ask you to get me a beer (I would if I could, though!). And if you ask 10 people to listen to the same song, they’ll each tell you something different when asked what it means.

However, language suffers some of the same faults. Has anyone ever misunderstood you? Or have you ever said something that came out wrong? Of course you have. Language is specific, but the interpretation is difficult. I think music suffers a somewhat similar fate: a composer can intend to convey a scene or a feeling, but different audience members will have different responses.

Also, I’m blogging right now. (Duh.) But why? Well, blogging has a certain set of affordances that other kinds of communication lack. I could say this out loud, but only the other people near my desk would hear me. And once I’ve said it, it’s gone forever. I could write a book, but that means people need to buy it to read my thoughts. I could write a poem, but my prose is terrible. The point is that I’m writing this in blog form because it seems to be the best way for me to share these specific ideas in a way that I want to share them. Music is no different. I can express things that are difficult or impossible to express outside of music.

I think a more complete analysis of the affordances of music would be a swell thing to do, but here’s a short sketch: musical expression has no substitute mode of expression. I can’t accurately tell you about a piece of music, I can only approximate it in words. Information is lost when I talk about it compared to you experiencing it first hand. I think what is lost is the thrill and the emotion. Not only am I sharing words, but I’m sharing my interpretation of it. I’ve taken the experience out of it. It’s like baby food: the nutrition is there, but the experience of texture is lost in the processing.

Music is interesting.

Unlike language, music is inherently interesting. Language is designed to convey specific ideas. The goal is clarity and meeting expectations of normal patterns of communication. Sentences have at least a noun and a verb. Normal communication is utilitarian and functional. Musical communication is impressionistic and fanciful.

Part of the joy of listening to music is the blend of having your expectations met and defied in unexpected but carefully constructed ways. A piece of music establishes or implies a set of rules, but then defies those rules for your enjoyment. For example, a common thing to do in a pop song is to modulate up part of the way through the song. This defies expectations because the song has clearly established itself to exist in a given key, but then everything suddenly shifts upwards. The foundation the song was built on just got pushed upward a little bit. It’s startling, but it can be pleasant when done artfully. Another example is establishing a phrase (a pattern) by repeating the structure, but then unexpectedly stopping the pattern short. Again, this can be quite exhilarating and pleasant when done carefully. Imagine that happening in a conversation, though. Someone is talking to you and they just stop right in the

… Language doesn’t work that way, does it? Language is meant to inform and music is meant to challenge and entertain you, in a broad sense. Attempts to describe music in terms of musical forces (like physical forces) sometimes stumble because music does unexpected things. A thrown ball will always obey physical forces. In that sense, it is uninteresting. Music, however, will only sometimes obey musical forces and that’s part of the point.

Music is important.

Music is a means of expression for both performers and listeners. It is therapeutic. Music helps build identity both for individuals and groups. These are concrete, real psychological benefits. Music helps us survive, and it helps shape societies.

And now, I think a brief explanation of what music is not would be useful.

Sheet music is a lie.

Sheet music is not music nor is it an accurate representation of music. It is a shorthand expression and a necessary means to preserve musical ideas in the era before recording audio was possible. It is a useful guide for memorization and performance. Systems that explicitly or implicitly rely on sheet music as if it is real music are faulty.  Sheet music captures onsets and durations in an abstract and imperfect way, and make little to no attempt to capture feeling.

Schenkerian analysis is a way to analyze music, but it is not the way.

Schenkerian analysis is a useful tool to analyze music of a certain type when asking certain questions. However, since it is by far the dominant (heh) method of musical analysis, it is often applied to situations where it is not relevant or meaningful. Schenkerian analysis also presumes that sheet music is an accurate representation of music. Schenkerian analysis is performed on sheet music, not actual music. It is also produces a tautological result: each piece of music can be reduced to simpler and simpler versions, eventually ending in a descending pattern of notes. On the surface, this is a stunning revelation about how music works but the problem is that Schenkerian analysis demands this outcome.

When studying the psychological implications of music, it is important to ask questions about the music that most people actually experience.

Remember, music is a phenomenon that exists in the mind. It then follows that it is important to study the kinds of music found in most minds. And I think it’s safe to say that Schubert isn’t it. It’s time to roll up our sleeves and dig into the music of the now.

Music perception and cognition research largely limits itself to SERIOUS CLASSICAL MUSIC and maybe jazz when feeling cheeky. This is a problem! And please don’t think I’m knocking serious classical music or jazz, or the study of this music. It’s very important and relevant and I am grateful that people do it because both of the forms of music profoundly influence our current popular music.

What I am advocating is that music be studied in such a way that is more related to how most people experience music. Artificiality is a challenge in any line of research, but this stumbling block seems easy enough to avoid. The barriers to studying popular music are institutional elitism, not practical issues.

Anyway, I hope you enjoyed this or at the very least found it provocative. I know it helped me a lot to codify all of these thoughts in one place, so I thank you for the indulgence.