AI Love?
I’m always curious and learning about music production, and Dan Worrall is a content creator and music mixer with more than 30 years’ experience whom I always enjoy, even if I don’t always understand what he gets on about. But he’s real. He has opinions and passions and beefs and lots of technical experience, and all of that is great, and relatable.
Today he released this video review of an AI program that purports to do the full job of mixing a song automatically. Mixing a song means taking all the “tracks,” the individually recorded instruments and vocals, and blending them together to make a coherent and pleasant whole such that each part sonically compliments the composition, and each part sticks out just the right amount at the right times to serve the emotional and artistic purpose of the song. It’s actually very complex stuff - it’s hard! - and it requires some engineering and technical knowledge combined with artistic sense, the ability to look at all these parts from the level of not only the forest but from each tree and leaf.
Anyway, here’s the video. It’s long, and the only part that interests me for the purposes of this post is the last four minutes starting at 40:16 where I have time stamped the video link on the right.
He describes what he has figured out the AI’s process to be (do yourself a favor: click the video and listen to his rant!). Its method is to “listen” to each track in the mix and then process it to a sonic quality and level of volume based on an aggregated or algorithmic reference standard for loudness and sonic quality built from its reference data set of tracks for an instrument, such as a kick drum or a rthym guitar, etc. After doing that for each constituent part, it performs another round of algorithmic processing to make sure things even out and match a reference range for loudness and dynamics (“dynamics” means the range in volume level of loud parts to quieter parts, including for each individual track as well as the song as a whole). And then the AI mix is done. Then it sounds like a song.
Or, perhaps it sounds like. . . the average of every song, or at least of every song in its reference data set. Let’s leave aside the question of where the reference music data set comes from, and whether it is derived from stolen copyrighted works. Here’s my question: when you listen to music, are the songs that stand out most to you, the ones you turn to at times of great joy or sadness in your life, from your point of view, average or unique? Do they stand out to you because they sound to you like other songs, or do they leave their mark on you because the don’t sound like other songs and because they hit you in just the right way at just the right time to have an impact on you that never really goes away?
From a music mixing point of view, what makes a great record unique at the mixing stage, assuming you already have good or great raw inputs to begin with, are the little choices along the way to, for example, make that snare drum peek out just a little bit right here in this section for the next four bars, because that little extra, barely perceptible infusion of energy that the listener won’t exactly notice will nevertheless create a little more feeling and emotional intensity at just the right part of the song, matching the emotional direction and message of the overall composition. In another example, maybe adding just these few individually undetectable layers of reverb and delay (these are sonic effects that create ambiance, a sense of space, and little echoes) on Billie Eilish’s voice at just this part of the song creates that special haunting texture that takes the listener’s experience from “that sounds nice” to “now I’m crying.”
AI algorithmic mixing can’t do that because it can’t know what it’s like to feel abandoned, or to love or be loved, and then translate those emotions into sound for any individual song, artist, and set of instruments. The songs you love that stick with you through life went through that human, unique, idiosyncratic, intentional process of creation. Someone like Dan in that video above ranted their way through mixing a final product, passionately, intentionally, with opinions and life experience. AI can’t do that.
This is not a trivial point. We live in an age where technology is diverting us from human connection and psychological growth and health. AI server farms themselves are consuming the planet’s potable water at alarming rates just as we’re charging into a global climate catastrophe during the next twenty years. People are turning to AI for companionship and therapy now. I was trained as a clinical psychologist, and I can tell you with certainty that the process of knowing, feeling, intuiting what to say, how to say it, and when to say it (and when to shut up) constitutes the art and craft and vocation of therapy, and it is inherently driven and co-created in the moment by the therapist and client as unique human beings. The epidemics of loneliness, despair, suicide, and flight to violence and authoritarianism of the modern age can only be accelerated, not resolved, by the erasure of the human and the complexities of building positive, authentic relationships.
Those are all big topics, and I may unpack them more during future blog posts. I have a lot to say about this stuff.