A Fully Convolutional Deep Auditory Model for Musical Chord Recognition
July 24, 2016
This site contains information on reproducing the experiments in the paper
A Fully Convolutional Deep Auditory Model for Musical Chord Recognition
Korzeniowski, F. and Widmer, G.
In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Salerno, Italy, 2016.
Software
The pre-trained chord recognition model is available as part of the madmom audio processing framework. It differs only slightly from the model presented in the paper: instead of padding the input in the first few convolutional layers, we use a larger context window in the input layer, such that the feature map size after the first pooling layer is retained.
The model provided with madmom is trained on a variety of datasets: Isophonics, RWC Popular, Robbie Williams, and Billboard. If you use it in research, keep in mind that its performance on these sets might be optimistic. We plan to provide model files for individual folds in the future.
Reproducing the experiments
Although we tried to facilitate reproducing our experiments as easily as possible, doing it is much more involved than applying the pre-trained model. You will have re-create the experimental pipeline, install all necessary libraries, and prepare the audio and chord data.
Experimental Pipeline Setup
Install the chordrec framework by following the instructions in the README file.
Data Setup
Put all datasets into respective subdirectories under
chordrec/experiments/data
: beatles
, queen
, zweieck
, robbie_williams
,
and rwc
. The datasets have to contain three types of data: audio files in
.flac
format, corresponding chord annotations in lab format with the file
extension .chords
, and the cross-validation split definitions. Audio and
annotation files can be organised on a directory structure, but do not need
to; the programs will look for any .flac
and .chord
files in all
directories recursively. However, the split definition
files must be in a splits
sub-directory in each dataset directory (e.g.
beatles/splits
). File names of audio and annotation files must correspond to
the names given in the split definition files. For more information regarding
the data take a look at the Data section below, where we provide a .zip
file
with the annotations and split definitions that you just need to extract
into the experiments
directory.
The data
directory should look like this, where the internal structures
of the queen
, robbie_williams
, rwc
and zweieck
directories following
the one of the beatles
:
experiments
+-- data
+-- beatles
+-- *.flac
+-- *.chords
+-- splits
+-- 8-fold_cv_album_distributed_*.fold
+-- queen
+-- robbie_williams
+-- rwc
+-- zweieck
Make sure the link chordrec/experiments/mlsp2016/data
refers to this
directory and works.
Running the Experiment
Follow the chordrec/experiments/mlsp2016/README
file to run the experiments.
Training the CNN can take a while.
Results are stored in the results
sub-directory created when running the
experiment. The results of each experiment is stored in the artifacts/results.yaml
file in each subdirectory. To get a quick overview of the results, you can
simply use the grep
command, e.g.:
grep majmin results/*/artifacts/results.yaml
Data
We trained the neural network on a compound dataset comprising the Beatles,
Queen, Zweieck, Robbie Williams and RWC popular music datasets. While we cannot
provide audio files due to copyright reasons, we do provide links with more
information about these datasets and to the chord annotations we used. Note
that our file naming scheme differs from the annotation archives you can
download on the respective sites. For convenience, we provide a .zip
archive
with annotations following our naming scheme further below.
- Beatles, Queen and Zweieck: See the isophonics website for chord annotations and information about audio files.
- Robbie Williams: The dataset description can be found in Bruno Di Giorgi et. al., “Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony”. Download the annotations from here.
- RWC: For information on obtaining the audio files, see the RWC website. Chord annotations are on GitHub.
Download
For convenience, we provide the annotations renamed to our naming scheme
here. This
archive also includes the fold definitions for cross-validation for each
dataset. You can extract this archive into the experiments/data
directory and
add the audio files to the respective directories. Then you should be ready to
go.