Tom Lieber. COS 325, Spring 2009.
The goal was to create a beat detector that would also guess at measure length. That beat detector could then be made to synchronize to a human's performance for tasks like improvising alongside, or for maintaining the clock on a performer's tool like Ableton Live.
I was not able to achieve this goal, although it does seem attainable with more work. I have created a rudimentary beat detector, some code that may be useful for taking the beat detector to a synchronizing tool as originally intended, and discussion on stumbling blocks that I ran into that kept me from reaching my goal.
My work is based on the paper "Tempo and beat analysis of acoustic musical signals" by Eric Sheirer. He argues that amplitude envelopes of a few subbands are enough to derive a signal's rhythm, given that rhythm is still mostly perceived when a white noise signal is vocoded with the subband envelopes of a musical signal. (Scheirer 589). He proposes an algorithm for determining the tempo of a signal by extracting subband envelopes, then using a bank of comb filters tuned to resonate at frequencies of beats at each interesting tempo. The entire schematic from the paper is reproduced in Figure 1.

Figure 1. Schematic of Scheirer's beat-tracking algorithm. Reproduced from the paper without permission.
I created this model ChucK code efficient enough to run in real-time, although my results were not very motivating.
Indented code in the following sections is written in ChucK and comes from my finished beat detector, which can be downloaded as a single file here: beat.ck.
Wanting to replicate Scheirer's results, I used the same filterbanks as he used: a low-pass with a 200 Hz cutoff; band-passes at 200–400 Hz, 400–800 Hz, 800–1600 Hz, and 1600–3200 Hz; and a high-pass with a 3200 Hz cutoff. Each band was a sixth-order elliptical filter with 3 dB of ripple in the passband and 40 dB of rejection in the stopband, implemented as three chained BiQuad unit generators with coefficients generated by MATLAB R2008b. More exact instructions can be found in the comments of my beat detector's source code so that the filters can be changed.
6 => int bands;
Gain filterbank_outs[bands];
BiQuad bq1[bands];
BiQuad bq2[bands];
BiQuad bq3[bands];
for(0 => int i; i < bands; i++) {
input => bq1[i] => bq2[i] => bq3[i] => filterbank_outs[i];
// ... code for setting coefficients of bq1, bq2, bq3 ...
}

Click from a click track, third band.
Envelopes of each band were generated by filtering a rectified version of each band through a low-pass filter (second-order Butterworth) with cutoff a frequency of 30 Hz. The low-pass filter may make the signal become negative on its way to zero, so a half-wave rectifier is also used.
Gain envelope_outs[bands];
for(0 => int i; i < bands; i++) {
filterbank_outs[i] => FullRect rect => LPF envlpf =>
HalfRect rect2 => envelope_outs[i];
envlpf.set( 30, 1 );
}

Click from a click track, envelope extracted from third band.
The envelopes are sampled at 200 Hz, differentiated, then half-rectified, resulting in a signal with positive, non-zero values at the start of an envelope peak, and zero otherwise.
Gain differentiator_outs[bands];
fun void differentiator(UGen in, Step out) {
in => blackhole; // suck samples
in.last() => float last;
while((second / differentiator_freq) => now) {
in.last() - last => out.next;
in.last() => last;
}
}
for(0 => int i; i < bands; i++) {
Step step => HalfRect rect => differentiator_outs[i];
0 => step.next; // otherwise, clicks at the start
spork ~ differentiator(envelope_outs[i], step);
}

Click from a click track, differentiated version of envelope from third band.
To determine the tempo of the song, each band's differentiated envelope is fed through a bank of comb filters with periods at each possible tempo. Each feedback comb filter is implemented as a Delay unit generator, one per comb. The period of a delay line is (1 / tempo) / differentiator_frequency because they are only fed new samples at the rate of the differentiator (by default, at 200 Hz). To accomplish this in ChucK, all resonators are connected to sinks only for one sample at a time, at the desired resampling frequency. This way it is possible to use a great number of comb filters at once without too much processing for real-time.
Because all combs for the same band differ only in their length, a single delay buffer could be used for all of them. However, this is not possible with ChucK's Delay unit generator, and this implementation method is feasible for a small number of resonators, because the length of each is only a few hundred samples (110 samples for 120 beats per minute, 130 for 100 beats per minute).
It is desired that the output of all delay lines can be directly compared with one another for determining the tempo of the input, but the filters lose energy at different rates depending on their lengths. To compensate, the gains on the comb inputs and delay lines are adjusted to match the difference equation y_t = a * y_{t-T} + (1 - a) * x_t for delay length T and a given by a = 0.5 ^ (t / T). Scheirer recommends values of t between 1500ms and 2000ms. In my implementation, 1500mb is used.
60 => float mintempo; // beats per minute
120 => float maxtempo;
50 => int numcombs; // number of tempos to sample (resonators)
Delay combs[bands][numcombs];
Gain combgains[bands][numcombs];
Gain resonator_outs[bands][numcombs];
// map a value from one range linearly onto another range
fun float map(float val, float frommin, float frommax,
float tomin, float tomax) {
return (val - frommin) / (frommax - frommin) *
(tomax - tomin) + tomin;
}
// numcombs resonators on each band
for(0 => int i; i < bands; i++) {
for(0 => int p; p < numcombs; p++) {
differentiator_outs[i] => combgains[i][p] => resonator_outs[i][p];
resonator_outs[i][p] => combs[i][p] => resonator_outs[i][p];
map(p, 0, numcombs-1, mintempo, maxtempo) => float tempo;
(1.0 / tempo)::minute => dur period;
period / differentiator_freq => combs[i][p].max => combs[i][p].delay;
Math.pow(0.5, 1500::ms/period) => float alpha;
1 - alpha => combgains[i][p].gain;
alpha => combs[i][p].gain;
}
}
// only connect the comb filters
// at the sample rate of the differentiators
fun void resample_resonators() {
while(second / differentiator_freq => now) {
for(0 => int band; band < bands; band++)
for(0 => int comb; comb < numcombs; comb++)
resonator_outs[band][comb] => blackhole;
1::samp => now;
for(0 => int band; band < bands; band++)
for(0 => int comb; comb < numcombs; comb++)
resonator_outs[band][comb] =< blackhole;
}
}
spork ~ resample_resonators();
According to Scheirer, the most likely tempos can be determined by finding the resonators with the peak energy at any point in time. The code below prints the number of the maximum resonator, but the code provided with this paper shows the tempo graphically using an ASCII read-out or a MAUI-based interface.
Scheirer's beat detector, once it found a tempo, would inspect the most energetic delay line's history for a peak in order to determine phase. This is not possible with ChucK's Delay unit generator without modification. Thus some other means would be necessary to determine phase, a problem I have not tackled.
fun void pick_tempo() {
float values[numcombs];
int maxcomb;
while(second / differentiator_freq * 10. => now) {
0 => maxcomb;
for(0 => int comb; comb < numcombs; comb++) {
0 => values[comb];
for(0 => int band; band < bands; band++)
resonator_outs[band][comb].last() +=> values[comb];
if(values[comb] > values[maxcomb])
comb => maxcomb;
}
<<< maxcomb, values[maxcomb] >>>;
}
}
spork ~ pick_tempo();
Due to an as yet unfound bug, I have not achieved my goal of creating software that can synchronize a computer's metronome with that of a live performer. However, by finding peaks from the differentiators' output, I have created a beat detector. The performance of the detector depends on how well peaks can be detected, which depends a lot on the input. See the following two figures for examples of how well the algorithm performs.

Low-hanging fruit: the sub-200 Hz band of a clip of dance music, its extracted envelope, and the differentiated version of that envelope. The bottom two signals have been normalized to [-1, 1].

400–800 Hz band (with envelope, and differentiated signal) of a clip of lively piano music with voice and a cheering crowd.
comp.mp3 is provided as a demonstration of the performance of the beat detector in the two example situations above (in reverse order) using a fixed cut-off on the differentiator's output for determining beat peaks from noise. The clips are "One Angry Dwarf and 200 Solemn Faces" by Ben Folds from Ben Folds Live and "The Girl in Byakkoya" by Susumu Hirasawa from the Paprika soundtrack. While it detects more beats than exist in the Ben Folds track (for example, during the applause), it correctly detects many beats from all over the frequency spectrum in "The Girl in Byakkoya". Due to the static nature of the beat-detecting code, it does not perform well in both low-volume and high-volume situations, so no beats are detected in the intro portion of "The Girl in Byakkoya," even though an ideal detector would count the notes there.

Last modified September 16, 2009.