Sound examples for Data-Driven Recomposition Using the Hierarchical Dirichlet Process Hidden Markov Model, submitted to ICMC 2008.

These sound examples were produced by a model trained on the dance-pop song Chewing Gum by the Norwegian pop singer Annie, which can be heard in its entirety at her MySpace page here.

LFCC Resynthesis: Much of the audio on this page was produced by reversing the Log-Frequency Cepstral Coefficient (LFCC) extraction process and normalizing the power of the signal thus produced. This removes all pitch and fine spectral detail from the sound. Below is a clip synthesized simply by extracting and reversing the LFCCs from the first ~80 seconds of Chewing Gum. This makes no use of the HDP-HMM techniques described in the paper, but is useful for getting a sense of what reversed LFCCs sound like.

download

N-gram Markov Chain Synthesis: The examples below were produced by training an HDP-HMM on Chewing Gum, finding the maximum likelihood Nth-order Markov chain that explained the resulting state sequence, and using that Markov chain to generate a new state sequence. The resulting state sequence was used to generate audio in two ways: first by drawing a new feature vector from the emission density associated with each state in the sequence and transforming it into audio, and second using the cluster mosaicing technique described in the paper to recombine windows of audio from various parts of the song.

1st-order Markov chain:
LFCC synthesis: download
Cluster mosaicing: download

4th-order Markov chain:
LFCC synthesis: download
Cluster mosaicing: download

8th-order Markov chain:
LFCC synthesis: download
Cluster mosaicing: download

N-gram Markov Chain Synthesis Trained on Multiple Songs: The following longer clip was produced by training a hierarchical HDP-HMM on four songs from the album Anniemal by Annie: Chewing Gum, Heartbeat, Helpless Fool for Love, and The Greatest Hit. The state sequences from each of the four songs were then combined and used to derive a maximum likelihood 7th-order Markov chain, which was then used to generate a sequence of 10000 LFCC feature vectors, which were then used to synthesise 3 minutes and 52 seconds of audio. The result stitches together bits and pieces of each of the four songs in a way that is sometimes (not always) surprisingly smooth.
download