These sound examples were produced by a model trained on the dance-pop song Chewing Gum by the Norwegian pop singer Annie, which can be heard in its entirety at her MySpace page here.
LFCC Resynthesis: Much of the audio on this page was produced by reversing the Log-Frequency Cepstral Coefficient (LFCC) extraction process and normalizing the power of the signal thus produced. This removes all pitch and fine spectral detail from the sound. Below is a clip synthesized simply by extracting and reversing the LFCCs from the first ~80 seconds of Chewing Gum. This makes no use of the HDP-HMM techniques described in the paper, but is useful for getting a sense of what reversed LFCCs sound like.
N-gram Markov Chain Synthesis: The examples below were produced by training an HDP-HMM on Chewing Gum, finding the maximum likelihood Nth-order Markov chain that explained the resulting state sequence, and using that Markov chain to generate a new state sequence. The resulting state sequence was used to generate audio in two ways: first by drawing a new feature vector from the emission density associated with each state in the sequence and transforming it into audio, and second using the cluster mosaicing technique described in the paper to recombine windows of audio from various parts of the song.
1st-order Markov chain:
LFCC synthesis: download
Cluster mosaicing: download
4th-order Markov chain:
LFCC synthesis: download
Cluster mosaicing: download
8th-order Markov chain:
LFCC synthesis: download
Cluster mosaicing: download
N-gram Markov Chain Synthesis Trained on Multiple Songs: The following
longer clip was produced by training a hierarchical HDP-HMM on four songs from
the album Anniemal by Annie: Chewing Gum, Heartbeat,
Helpless Fool for Love, and The Greatest Hit. The state sequences
from each of the four songs were then combined and used to derive a maximum
likelihood 7th-order Markov chain, which was then used to generate a sequence
of 10000 LFCC feature vectors, which were then used to synthesise 3 minutes
and 52 seconds of audio. The result stitches together bits and pieces of each
of the four songs in a way that is sometimes (not always) surprisingly smooth.
download