Collecting Music Data

Collecting Music Data

Before any sort of analysis and insights can be derived, the first step is collecting data. In this post, I'll take a look at some of the sources of music data.

There are many realms that music data could encompass. The data could be around a song:

  • Name of the song
  • Who wrote it
  • BPM

It could be about artists:

  • Artist name
  • Band members
  • Records produced

We could even collect data on venues, publishers, record labels, and the list goes on. What data is available for our consumption and where can we find it?



This is one data source that I'm really fond of. Why? Because it is free and open-source meaning anyone can contribute and anyone can use it. This is the primary data source that The Record Industry will use in our posts since we are also free and can't afford to pay the expensive data vendors...

From their site:

MusicBrainz aims to be:

  • The ultimate source of music information by allowing anyone to contribute and releasing the data under open licenses.
  • The universal lingua franca for music by providing a reliable and unambiguous form of music identification, enabling both people and machines to have meaningful conversations about music.

What data do they have?

There is a plethora of data captured here including artists, albums/releases, works, and recordings.

How to access

MusicBrainz has an API or you can download the database. You can also simply search straight from their website. There are also libraries for different programming languages, including python.



Added November 12, 2021

Another open-source data source that I discovered a bit later on. It has stricter submission standards, which makes the data feel a bit cleaner. For example, there is a standard of 15 genres. If somebody wants to submit a new release, they'll need to tag it as one of those 15 genres. As a data scientist, clean and standardized data, makes my job a whole lot easier.

Discogs mission:

Since Discogs was started in the year 2000, our mission has always been to build the biggest and most comprehensive interactive public music Database in the world - a site with discographies of all labels and all artists, all cross-referenced with clickable links. It’s like Wikipedia for music. Why? Because music is what makes us human, and keeping a well-organised, public archive of all the recorded music in the world helps preserve a full picture of who we are, with all the natural diversity in tact.

What do they have?

The primary objects in their database are artists, labels, and releases.

How to access

Their website is very user-friendly if you are wanting to get right to browsing. However, for our purposes, we'll need dumps of their data that we can plug into our database. They provide monthly dumps of their database which are free to download.

Third-Party Data Vendors

The below sources are data vendors that you must pay in order to access the data that they have already collected. We have not had any experience with these data vendors so cannot speak to their value.

A Nielsen company that in addition to music data, also offers video and sports data.