The Deep Chroma Extractor
May 23, 2016
This page contains additional information and data necessary to reproduce the results of the following paper:
F. Korzeniowski and G. Widmer. “Feature Learning for Chord Recognition: The Deep Chroma Extractor”. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR 2016), New York, USA.
Software
The pre-trained feature extractor is available as part of the
madmom audio processing framework as the
DeepChromaProcessor
class. Note that the model bundled with madmom differs
from the one we used in the paper: it uses less units in the hidden layers of
the neural network and operates on a narrower frequency band. We thus reduced
the size of the model files without compromising the results too much.
The best performing model trained for the paper is available
here.
To use it with madmom, set the models
parameter of the DeepChromaProcessor
class accordingly, and adapt the spectrogram parameters: fmin=30
,
fmax=5500
, unique_filters=False
. Note that the file actually contains
8 models, one for each test fold. If you load all of them, the final prediction
will be the average prediction of each model.
If you are using the models in your research, make sure to only use models that
are not trained on files that you test on. A model with index
i (e.g. chroma_dnn_i.pkl
) was trained on all files in folds different from i.
For example, the model in chroma_dnn_2.pkl
was trained on all files defined
in the folddef_$j.fold
(where $j
is in {0, 1, 3, 4, 5, 6, 7}) files of
all datasets used (beatles, queen, zweieck, rwc, robbie_williams). See the
download section at the bottom for the fold definitions we used.
Reproducing the experiments
Although we tried to facilitate reproducing our experiments as easily as possible, doing it is much more involved than applying the pre-trained model. You will have re-create the experimental pipeline, install all necessary libraries, and prepare the audio and chord data.
Experimental Pipeline Setup
Install the chordrec framework by following the instructions in the README file.
Data Setup
Put all datasets into respective subdirectories under
chordrec/experiments/data
: beatles
, queen
, zweieck
, robbie_williams
,
and rwc
. The datasets have to contain three types of data: audio files in
.flac
format, corresponding chord annotations in lab format with the file
extension .chords
, and the cross-validation split definitions. Audio and
annotation files can be organised on a directory structure, but do not need
to; the programs will look for any .flac
and .chord
files in all
directories recursively. However, the split definition
files must be in a splits
sub-directory in each dataset directory (e.g.
beatles/splits
). File names of audio and annotation files must correspond to
the names given in the split definition files. For more information regarding
the data take a look at the Data section below, where we provide a .zip
file
with the annotations and split definitions that you just need to extract
into the experiments
directory.
The data
directory should look like this, where the internal structures
of the queen
, robbie_williams
, rwc
and zweieck
directories following
the one of the beatles
:
experiments
+-- data
+-- beatles
+-- *.flac
+-- *.chords
+-- splits
+-- 8-fold_cv_album_distributed_*.fold
+-- queen
+-- robbie_williams
+-- rwc
+-- zweieck
Make sure the link chordrec/experiments/ismir2016/data
refers to this
directory and works.
Running the Experiment
We provide script and configuration files that run the exact experiments that
produced the results shown in Table 1 in the paper. They reside under
experiments/ismir2016
. If everything is set up correctly, running the
experiment should be as easy as calling the run.sh
script. This can take
a considerable amount of time, because each configuration is run ten times.
You can change that by editing the run.sh
script itself.
Results are stored in the results
sub-directory created when running
the experiment. In results
, there are four directories for each tested
configuration, which contain the results of all experiments. The results of
each experiment is stored in artifacts/results.yaml
. To get a quick
overview of the results for a specific configuration, you can simply use
the grep
command, e.g.:
grep majmin results/deep_chroma_test/*/artifacts/results.yaml
Data
We trained the neural network on a compound dataset comprising the Beatles,
Queen, Zweieck, Robbie Williams and RWC popular music datasets. While we cannot
provide audio files due to copyright reasons, we do provide links with more
information about these datasets and to the chord annotations we used. Note
that our file naming scheme differs from the annotation archives you can
download on the respective sites. For convenience, we provide a .zip
archive
with annotations following our naming scheme further below.
- Beatles, Queen and Zweieck: See the isophonics website for chord annotations and information about audio files.
- Robbie Williams: The dataset description can be found in Bruno Di Giorgi et. al., “Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony”. Download the annotations from here.
- RWC: For information on obtaining the audio files, see the RWC website. Chord annotations are on GitHub.
Download
For convenience, we provide the annotations renamed to our naming scheme
here. This
archive also includes the fold definitions for cross-validation for each
dataset. You can extract this archive into the experiments/data
directory and
add the audio files to the respective directories. Then you should be ready to
go.