The Deep Chroma Extractor

May 23, 2016

This page contains additional information and data necessary to reproduce the results of the following paper:

F. Korzeniowski and G. Widmer. “Feature Learning for Chord Recognition: The Deep Chroma Extractor”. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR 2016), New York, USA.

Software

The pre-trained feature extractor is available as part of the madmom audio processing framework as the DeepChromaProcessor class. Note that the model bundled with madmom differs from the one we used in the paper: it uses less units in the hidden layers of the neural network and operates on a narrower frequency band. We thus reduced the size of the model files without compromising the results too much.

The best performing model trained for the paper is available here. To use it with madmom, set the models parameter of the DeepChromaProcessor class accordingly, and adapt the spectrogram parameters: fmin=30, fmax=5500, unique_filters=False. Note that the file actually contains 8 models, one for each test fold. If you load all of them, the final prediction will be the average prediction of each model.

If you are using the models in your research, make sure to only use models that are not trained on files that you test on. A model with index i (e.g. chroma_dnn_i.pkl) was trained on all files in folds different from i. For example, the model in chroma_dnn_2.pkl was trained on all files defined in the folddef_$j.fold (where $j is in {0, 1, 3, 4, 5, 6, 7}) files of all datasets used (beatles, queen, zweieck, rwc, robbie_williams). See the download section at the bottom for the fold definitions we used.

Reproducing the experiments

Although we tried to facilitate reproducing our experiments as easily as possible, doing it is much more involved than applying the pre-trained model. You will have re-create the experimental pipeline, install all necessary libraries, and prepare the audio and chord data.

Experimental Pipeline Setup

Install the chordrec framework by following the instructions in the README file.

Data Setup

Put all datasets into respective subdirectories under chordrec/experiments/data: beatles, queen, zweieck, robbie_williams, and rwc. The datasets have to contain three types of data: audio files in .flac format, corresponding chord annotations in lab format with the file extension .chords, and the cross-validation split definitions. Audio and annotation files can be organised on a directory structure, but do not need to; the programs will look for any .flac and .chord files in all directories recursively. However, the split definition files must be in a splits sub-directory in each dataset directory (e.g. beatles/splits). File names of audio and annotation files must correspond to the names given in the split definition files. For more information regarding the data take a look at the Data section below, where we provide a .zip file with the annotations and split definitions that you just need to extract into the experiments directory.

The data directory should look like this, where the internal structures of the queen, robbie_williams, rwc and zweieck directories following the one of the beatles:

experiments
 +-- data
      +-- beatles
           +-- *.flac
           +-- *.chords
           +-- splits
                +-- 8-fold_cv_album_distributed_*.fold
      +-- queen
      +-- robbie_williams
      +-- rwc
      +-- zweieck

Make sure the link chordrec/experiments/ismir2016/data refers to this directory and works.

Running the Experiment

We provide script and configuration files that run the exact experiments that produced the results shown in Table 1 in the paper. They reside under experiments/ismir2016. If everything is set up correctly, running the experiment should be as easy as calling the run.sh script. This can take a considerable amount of time, because each configuration is run ten times. You can change that by editing the run.sh script itself.

Results are stored in the results sub-directory created when running the experiment. In results, there are four directories for each tested configuration, which contain the results of all experiments. The results of each experiment is stored in artifacts/results.yaml. To get a quick overview of the results for a specific configuration, you can simply use the grep command, e.g.:

grep majmin results/deep_chroma_test/*/artifacts/results.yaml

Data

We trained the neural network on a compound dataset comprising the Beatles, Queen, Zweieck, Robbie Williams and RWC popular music datasets. While we cannot provide audio files due to copyright reasons, we do provide links with more information about these datasets and to the chord annotations we used. Note that our file naming scheme differs from the annotation archives you can download on the respective sites. For convenience, we provide a .zip archive with annotations following our naming scheme further below.

  • Beatles, Queen and Zweieck: See the isophonics website for chord annotations and information about audio files.
  • Robbie Williams: The dataset description can be found in Bruno Di Giorgi et. al., “Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony”. Download the annotations from here.
  • RWC: For information on obtaining the audio files, see the RWC website. Chord annotations are on GitHub.

Download

For convenience, we provide the annotations renamed to our naming scheme here. This archive also includes the fold definitions for cross-validation for each dataset. You can extract this archive into the experiments/data directory and add the audio files to the respective directories. Then you should be ready to go.