Published on *Computer Science Department at Princeton University* (http://www.cs.princeton.edu)

Report ID:

TR-009-17

Authors:

Risteski, Andrej [1]

Date:

October 13, 2017

Pages:

236

Download Formats:

[PDF [2]]

A common theme in machine learning is succinct modeling of distributions over large domains. Probabilistic graphical

models are one of the most expressive frameworks for doing this. The two major tasks involving graphical models are

learning and inference. Learning is the task of calculating the best fit graphical model from raw data, while inference is

the task of answering probabilistic queries for a known graphical model (for example what is the marginal distribution

of one of the variables or what is the distribution of a subset of variables, after conditioning on the values of some

other subset of variables). Learning can be thought of as finding a graphical model that explains the raw data, while

the inference queries extract the knowledge the graphical model contains.

This thesis introduces new provable techniques for performing both of these tasks, in the context of both latent-variable

models – in which a portion of the variables in the graphical model are not observed, as well as fully-observable

undirected graphical models (Markov Random Fields). Chapters 2 and 3 will focus on learning latent-variable models,

while Chapter 4 will focus on inference in Markov Random Fields.

In Chapter 2, I will contribute the first provable results for analyzing variational Bayes: a family of alternatingminimization

style algorithms which is very popular in practice for learning latent-variable models. Despite it’s popularity

with practitioners, the only theoretical guarantees prior to this work concerned convergence to local minima.

We will prove that under reasonable assumptions, in the context of topic models, these algorithms will converge to the

global minimum.

Subsequently, in Chapter 3, we will use the method-of-moments along with new techniques in tensor decomposition

and constrained matrix factorization to derive algorithms for learning noisy-OR networks – the textbook example of

a probabilistic model for causal relationships. Importantly, these techniques were only applicable to linear latentvariable

models – which noisy-OR is not.

In Chapter 4, I will contribute a new understanding of a class of variational methods for calculating partition functions

in Markov Random Fields. The key technical ingredient is a connection to convex programming hierarchies – a recent

area of interest in combinatorial optimization, along with approximations of the entropy of a distribution based on

low-order moment information

**Links**

[1] http://www.cs.princeton.edu/research/techreps/author/833

[2] ftp://ftp.cs.princeton.edu/techreports/2017/009.pdf