Topic Modeling Code and Browsers
Much of my research is in topic modeling, building hierarchical probabilistic models of documents and other media to uncover latent structure in their contents. As an example of this research, here are slides from a recent talk on dynamic and correlated topic models applied to the journal Science . (Here is a video of the talk.)
The structure uncovered by topic models can be used to explore the otherwise unorganized collection: dividing documents according to their topics and using the hidden structure to determine similarity between documents. The following are browsers of large collections of documents built with topic models:
- A 100-topic browser of the dynamic topic model fit to Science (1882-2001).
- A 100-topic browser of the correlated topic model fit to Science (1980-2000)
- A 50-topic browser of latent Dirichlet allocation fit to the 2006 arXiv.
- A 20-topic browser of latent Dirichlet allocation fit to The American Political Science Review
The topic models mailing list is a good forum for discussing topic modeling.
I maintain some code for topic modeling. Questions, comments, and suggestions about this code should be posted to the topic models mailing list.
There is other code that I do not maintain, but want to post. That is to say: I don't have time to answer questions about this code, but I hope that you will find it useful.
Some of my students have released code as well:
- Supervised topic modeling for classification (Chong Wang)
- R package implementing many models (Jonathan Chang)