The new ‘crate-digging’: how Spotify uses artificial intelligence to find you new music

Mark Stent · Feb 19, 2021

Category: Music Industry

One of the best parts of being a DJ in the vinyl era was ‘crate-digging’ – the act of searching through ‘crates’ of records to find new music. It was fun and exciting. I would go to my favourite record shop and spend a morning listening and sifting through hundreds of tracks to find the one or two that caught my ear.

As time went on and things evolved, the digital era took over ( I remember being disgusted at my best mate, Ricardo Da Costa, moving from vinyl into playing music off CD’s…he had sold out!) and record shops closed down. Over time CD’s also phased out and streaming took over.

And boy did it take over…Spotify alone currently has 144 million premium subscribers and shows no sign of slowing down in their quest for world domination.

So how are they satisfying users’ thirst for new music? How are they keeping users engaged? How are they servicing the music industry?

The answer lies in the dark dungeons of mathematics, statistics and predictive analytics and through various clever machine learning and artificial intelligence models.

Machine learning is defined in the Oxford dictionary as: “the use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyse and draw inferences from patterns in data.”

Just think about that…learning without following instructions… sounds like something out of a science fiction movie.

If you use Spotify regularly you will notice that your feed is regularly updated with content that is customized to your listening experience. Personalisation is something Spotify has mastered and is what truly makes the user experience so incredible.

Collaborative filtering

Spotify uses a process called collaborative filtering to track and compare usage of the app with other similar users. So, for example, if person A listens to track 1, 3 and 6 and person B listens to the same, its highly likely that both person A and B have similar tastes. It also tracks how users ‘rate’ the tracks they listen to by monitoring how often they play a song, the length of time they listen, whether they save the song or not, whether the user clicks on the artists home page etc.

The goal of the collaborative filtering process is to find patterns in how users consume music and is not dependent on the song itself (ie the audio components)for information.

The only flaw with this process is that it would be weighted towards popular music as there would be more usage data related to these songs and new and unknown songs would suffer. This problem is solved analyzing the songs themselves through another artificial intelligence model category: deep learning

Deep learning and audio

Deep learning is the latest and greatest in the arsenal of data scientists. It uses patterns based on the workings of the human brain to analyze and model data and find solutions. Spotify uses a sub category of deep learning called convolution neural networks to analyze the raw audio and categorize and evaluate them accordingly. Incidentally, this is a similar process used in facial recognition. This model helps the collaborative filtering process to access unknown and new songs with little data and recommend them to users’ playlists.

Natural language processing (NLP)

To add another layer to their personalised recommendation model Spotify uses NLP, a branch of artificial intelligence that helps computers understand, interpret and manipulate human language. They do this by scraping web feeds, blogs, articles, social media and other sources for keywords and phrases describing songs and artists and the conversations around them.

It then takes this information and categorizes and weights it according to its relative importance. So each song or artist can have a dictionary of thousands of keywords and phrases assigned to it to help them be found by algorithms and users. These dictionaries are updated on an ongoing basis and change from day to day.

Since the system is data based, it follows that more data will lead higher chances of being recommended, so this means that popular songs still have the upper hand, but the algorithm is constantly being updated and is evolving to help new artists get more play

A formula for a hit?

If we can analyse Spotify’s data over the past 10 years and, more specifically, the top 100 tracks, is it possible to predict what songs are hit material?

If we take this data and run some machine learning models on it, we are able to pick up patterns in these songs and things they have in common to make very educated predictions. Spotify categorizes songs into a bunch of features such as song key, tempo, loudness, danceability etc, these are fed into the model to assist in making the predictions.

As a data scientist myself, I have played around with a very crude model using random forest trees and 26 of these variables to get close to 70 percent accuracy!

If this type of thing interests you a very interesting article by Hisham Hawara entitled ‘The science of predicting a hit song!’ goes into much more detail.

As much as I miss getting my hands dirty and finding new physical records, I am learning to love and embrace the new world of artificial intelligence in music. In a vinyl shop I was limited to the music in that shop, but with machine learning the WHOLE world of music is available to me.

Spotify are leaders in the game and are breaking innovations constantly. As a music lover, a music producer and an avid consumer of music, I cant wait for the next in Spotify ‘crate-digging’.

I would love to know your thoughts.

Leave a Reply

Your email address will not be published. Required fields are marked *