Analyzing Spotify Audio Features

What are the features of a song used by Spotify to make your custom playlists?

Analyzing Spotify Audio Features
Photo by Sergi Kabrera / Unsplash

Spotify provides some really useful features that it has calculated on each song which tell us about the song in the form of numbers and we like numbers.

They use these features in part to build out those custom playlists that you see on your home page among other things to cater to your personal music taste.

There are all sorts of analyses and modeling that have gone on behind the scenes to get us these final audio feature values. This means by using these features provided by Spotify, we are cutting out vast amounts of legwork (e.g. signal processing) to get to this point.

Definitions for each feature can be found on the Spotify Audio Feature Reference page.

Exploratory Data Analysis (EDA)

I've pulled the Spotify audio features from 729,191 songs from the past 4 years (2018 - November 2021). Let's explore the data first by looking at a correlation matrix.

Audio Feature Correlation Matrix

We immediately see some features with high correlation, let's take energy for example. There is a high positive correlation between energy and loudness and a high negative correlation between energy and acousticness.

This makes sense based on any music that I've listened to. Songs that are high energy tend to be louder and then acoustic songs tend to be lower energy.

Now let's look at each of these features individually.

Acousticness

A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

Distribution

Acousticness Distribution

The majority of songs from the past 4 years have tended towards "not acoustic".

Correlation

The correlation matrix shows acousticness has a high negative correlation with:

  • Energy - Acoustic songs tend to be lower energy
  • Loudness - Acoustic songs tend to be quieter

Danceability

Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

Distribution

Danceability Distribution

A nice normal distribution here shows that songs from the past 4 years are fairly evenly split as far as their danceability.

Correlation

The correlation matrix shows danceability has a high positive correlation with:

  • Valence - Interestingly, songs that have a sad, depressed, or angry valence tend to be higher for danceability

Energy

Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

Distribution

Energy Distribution

The majority of songs from the past 4 years have tended to be high-energy.

Correlation

The correlation matrix shows energy has a high positive correlation with:

  • Loudness - High energy songs tend to be louder

And it has a high negative correlation with:

  • Acousticness - Acoustic songs tend to be lower energy

Instrumentalness

Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

Distribution

Instrumentalness Distribution

Here's an interesting distribution. We see a small peak around .9 showing instrumental songs, but the huge spike at .0 shows tons of songs are definitely not instrumentals at all. 101,792 (14%) songs actually have a score of flat zero. Clearly, most songs contain vocals and Spotify is pretty good at picking up on that.

Key

The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.

Distribution

Key Distribution

Here is the distribution of the first categorical variable we've seen. The most popular key for songs is G (corresponding to number 7), the least popular key by far is D♯, E♭(corresponding to number 3).

Liveness

Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

Distribution

Liveness Distribution

The vast majority of songs from the past 4 years have been studio recordings.

Loudness

The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db.

Distribution

Loudness Distribution

The median for the past 4 years is -7.979.

Correlation

The correlation matrix shows that loudness has a high positive correlation with:

  • Energy - Loud songs tend to be more energetic

And a high negative correlation with:

  • Acoustic - Acoustic songs tend to be quieter

Mode

Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

Distribution

Mode Distribution

Just under 62% of songs from the past 4 years have been in major.

Speechiness

Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

Distribution

Speechiness Distribution

As our dataset does not contain podcasts and audiobooks, it is of course tending towards less speechiness.

Tempo

The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

Distribution

Tempo Distribution

A fairly normal distribution that shows songs from the past 4 years are evenly distributed around the median, 121.306 BPM.

Correlation

The correlation matrix at the beginning of the section does not show much of a correlation between tempo and time signature (because we are looking at tempo as a numeric feature and time signature as a categorical variable). However, it is worth noting that the Phik correlation coefficient shows a high correlation between the two.

Time Signature

An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of "3/4", to "7/4".

Distribution

Nothing fancy, 4 beats per measure songs make up 86% of the songs over the past 4 years.

Correlation

The correlation matrix at the beginning of the section does not show much of a correlation between tempo and time signature (because we are looking at tempo as a numeric feature and time signature as a categorical variable). However, it is worth noting that the Phik correlation coefficient shows a high correlation between the two.

Valence

A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

Distribution

Valence Distribution

The majority of songs over the past 4 years have leaned more towards the "negative" emotion side of the spectrum.

Correlation

The correlation matrix shows that valence has a high negative correlation with:

  • Danceability - Interestingly, the correlation shows that the sad, angry, or depressed songs have higher danceability.

Conclusion

There are interesting correlations and distributions for each of the features that Spotify has made available. We will use these observations in future models that we build.

This data is from songs over the past 4 years, do you think anything would look different if we looked at songs from, say 10-20 years ago?