Nonmonotonic Multicontext Models
Abstract:
We introduce three new techniques for statistical language models:
multicontextual modeling, nonmonotonic contexts, and the divergence
heuristic. Together these techniques result in language models that
have few states, even fewer parameters, and low message entropies.
For example, our techniques achieve a message entropy of 1.97
bits/char on the Brown corpus using only 94352 parameters. By
modestly increasing the number of model parameters in a principled
manner, our techniques are able to further reduce the message entropy
of the Brown Corpus to 1.92 bits/char. In contrast, the character
quadgram model requires more than 236 times as many parameters in
order to achieve a message entropy of only 2.59 bits/char. Given the
logarithmic nature of codelengths, a savings of 0.62 bits/char is
quite significant.