A Fully Convolutional Deep Auditory Model for Musical Chord Recognition

July 24, 2016

This site contains information on reproducing the experiments in the paper

A Fully Convolutional Deep Auditory Model for Musical Chord Recognition
Korzeniowski, F. and Widmer, G.
In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Salerno, Italy, 2016.

Software

The pre-trained chord recognition model is available as part of the madmom audio processing framework. It differs only slightly from the model presented in the paper: instead of padding the input in the first few convolutional layers, we use a larger context window in the input layer, such that the feature map size after the first pooling layer is retained.

The model provided with madmom is trained on a variety of datasets: Isophonics, RWC Popular, Robbie Williams, and Billboard. If you use it in research, keep in mind that its performance on these sets might be optimistic. We plan to provide model files for individual folds in the future.

Reproducing the experiments

Although we tried to facilitate reproducing our experiments as easily as possible, doing it is much more involved than applying the pre-trained model. You will have re-create the experimental pipeline, install all necessary libraries, and prepare the audio and chord data.

Experimental Pipeline Setup

Install the chordrec framework by following the instructions in the README file.

Data Setup

Put all datasets into respective subdirectories under chordrec/experiments/data: beatles, queen, zweieck, robbie_williams, and rwc. The datasets have to contain three types of data: audio files in .flac format, corresponding chord annotations in lab format with the file extension .chords, and the cross-validation split definitions. Audio and annotation files can be organised on a directory structure, but do not need to; the programs will look for any .flac and .chord files in all directories recursively. However, the split definition files must be in a splits sub-directory in each dataset directory (e.g. beatles/splits). File names of audio and annotation files must correspond to the names given in the split definition files. For more information regarding the data take a look at the Data section below, where we provide a .zip file with the annotations and split definitions that you just need to extract into the experiments directory.

The data directory should look like this, where the internal structures of the queen, robbie_williams, rwc and zweieck directories following the one of the beatles:

experiments
 +-- data
      +-- beatles
           +-- *.flac
           +-- *.chords
           +-- splits
                +-- 8-fold_cv_album_distributed_*.fold
      +-- queen
      +-- robbie_williams
      +-- rwc
      +-- zweieck

Make sure the link chordrec/experiments/mlsp2016/data refers to this directory and works.

Running the Experiment

Follow the chordrec/experiments/mlsp2016/README file to run the experiments. Training the CNN can take a while.

Results are stored in the results sub-directory created when running the experiment. The results of each experiment is stored in the artifacts/results.yaml file in each subdirectory. To get a quick overview of the results, you can simply use the grep command, e.g.:

grep majmin results/*/artifacts/results.yaml

Data

We trained the neural network on a compound dataset comprising the Beatles, Queen, Zweieck, Robbie Williams and RWC popular music datasets. While we cannot provide audio files due to copyright reasons, we do provide links with more information about these datasets and to the chord annotations we used. Note that our file naming scheme differs from the annotation archives you can download on the respective sites. For convenience, we provide a .zip archive with annotations following our naming scheme further below.

Beatles, Queen and Zweieck: See the isophonics website for chord annotations and information about audio files.
Robbie Williams: The dataset description can be found in Bruno Di Giorgi et. al., “Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony”. Download the annotations from here.
RWC: For information on obtaining the audio files, see the RWC website. Chord annotations are on GitHub.

Download

For convenience, we provide the annotations renamed to our naming scheme here. This archive also includes the fold definitions for cross-validation for each dataset. You can extract this archive into the experiments/data directory and add the audio files to the respective directories. Then you should be ready to go.