NEW TECHNIQUES FOR LEARNING AND INFERENCE IN BAYESIAN MODELS
A common theme in machine learning is succinct modeling of distributions over large domains. Probabilistic graphical
models are one of the most expressive frameworks for doing this. The two major tasks involving graphical models are
learning and inference. Learning is the task of calculating the best fit graphical model from raw data, while inference is
the task of answering probabilistic queries for a known graphical model (for example what is the marginal distribution
of one of the variables or what is the distribution of a subset of variables, after conditioning on the values of some
other subset of variables). Learning can be thought of as finding a graphical model that explains the raw data, while
the inference queries extract the knowledge the graphical model contains.
This thesis introduces new provable techniques for performing both of these tasks, in the context of both latent-variable
models – in which a portion of the variables in the graphical model are not observed, as well as fully-observable
undirected graphical models (Markov Random Fields). Chapters 2 and 3 will focus on learning latent-variable models,
while Chapter 4 will focus on inference in Markov Random Fields.
In Chapter 2, I will contribute the first provable results for analyzing variational Bayes: a family of alternatingminimization
style algorithms which is very popular in practice for learning latent-variable models. Despite it’s popularity
with practitioners, the only theoretical guarantees prior to this work concerned convergence to local minima.
We will prove that under reasonable assumptions, in the context of topic models, these algorithms will converge to the
Subsequently, in Chapter 3, we will use the method-of-moments along with new techniques in tensor decomposition
and constrained matrix factorization to derive algorithms for learning noisy-OR networks – the textbook example of
a probabilistic model for causal relationships. Importantly, these techniques were only applicable to linear latentvariable
models – which noisy-OR is not.
In Chapter 4, I will contribute a new understanding of a class of variational methods for calculating partition functions
in Markov Random Fields. The key technical ingredient is a connection to convex programming hierarchies – a recent
area of interest in combinatorial optimization, along with approximations of the entropy of a distribution based on
low-order moment information