|
| Research Interest |
|
I'm interested in applying techniques from theoretical computer science to problems in machine learning, with the hope of developing new algorithms with provable guarantees. The meta problem is: machine
learning people have been applying heuristic algorithms such as local
search or EM type algorithm to find solutions to many hard problems.
Are these problems really easy or hard? If they are easy can we use
algorithms with provable guarantees; if they are hard is there any
reasonable model where they become easy or even the known heuristic
algorithms can be proved to work?
|
|
| Publications
|
Towards a better approximation for sparsest cut? Arxiv
with Sanjeev Arora and Ali Kemal Sinop. To appear in FOCS 2013
[Abstract]
We give a new $(1+\epsilon)$-approximation for sparsest cut problem on graphs where small sets expand significantly more than the sparsest cut (sets of size n/r expand by a factor $\sqrt{\log n\log r}$ bigger, for some small r; this condition holds for many natural graph families). We give two different algorithms. One involves Guruswami-Sinop rounding on the level-r Lasserre relaxation. The other is combinatorial and involves a new notion called Small Set Expander Flows (inspired by the expander flows of ARV) which we show exists in the input graph. Both algorithms run in time $2^{O(r)} poly(n)$. We also show similar approximation algorithms in graphs with genus g with an analogous local expansion condition. This is the first algorithm we know of that achieves $(1+\epsilon)$-approximation on such general family of graphs.
|
A Tensor Approach to Learning Mixed Membership Community Models Arxiv
with Anima Anandkumar, Daniel Hsu and Sham M. Kakade. In COLT 2013
[Abstract]
Modeling community formation and detecting hidden communities in networks is a well studied problem. However, theoretical analysis of community detection has been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we remove this restriction, and consider a family of probabilistic network models with overlapping communities, termed as the mixed membership Dirichlet model, first introduced in Airoldi et al. This model allows for nodes to have fractional memberships in multiple communities and assumes that the community memberships are drawn from a Dirichlet distribution. We propose a unified approach to learning these models via a tensor spectral decomposition method. Our estimator is based on low-order moment tensor of the observed network, consisting of 3-star counts. Our learning method is fast and is based on simple linear algebra operations, e.g. singular value decomposition and tensor power iterations. We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method. Additionally, our results match the best known scaling requirements in the special case of the stochastic block model.
|
A Practical Algorithm for Topic Modeling with Provable Guarantees Arxiv
with Sanjeev Arora, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu, in ICML 2013
[Abstract]
Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model inference have been based on a maximum likelihood objective. Efficient algorithms exist that approximate this objective, but they have no provable guarantees. Recently, algorithms have been introduced that provide provable bounds, but these algorithms are not practical because they are inefficient and not robust to violations of model assumptions. In this paper we present an algorithm for topic model inference that is both provable and practical. The algorithm produces results comparable to the best MCMC implementations while running orders of magnitude faster.
|
Tensor decompositions for learning latent variable models Arxiv
with Anima Anandkumar, Daniel Hsu, Sham M. Kakade, Matus Telgarsky[Abstract]
This work considers a computationally and statistically efficient parameter estimation method
for a wide class of latent variable models|including Gaussian mixture models, hidden Markov
models, and latent Dirichlet allocation|which exploits a certain tensor structure in their loworder
observable moments (typically, of second- and third-order). Specifically, parameter estimation
is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric
tensor derived from the moments; this decomposition can be viewed as a natural generalization
of the singular value decomposition for matrices. Although tensor decompositions are generally
intractable to compute, the decomposition of these specially structured tensors can be efficiently
obtained by a variety of approaches, including power iterations and maximization approaches
(similar to the case of matrices). A detailed analysis of a robust tensor power method is provided,
establishing an analogue of Wedin's perturbation theorem for the singular vectors of
matrices. This implies a robust and computationally tractable estimation approach for several
popular latent variable models.
|
Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders Arxiv
with Sanjeev Arora, Ankur
Moitra and Sushant Sachdeva. in NIPS 2012[Abstract]
We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees. In particular, suppose we are given samples of the form y = Ax + \eta where A is an unknown n*n matrix and x is a random variable whose components are independent and have a
fourth moment strictly less than that of a standard Gaussian random variable, \eta is an n-dimensional Gaussian random variable with unknown covariance \Sigma: We give an algorithm that provable recovers A and \Sigma up to an additive \epsilon whose running time and sample complexity are polynomial in n and 1 / \epsilon. To accomplish this, we introduce a novel "quasi-whitening" step that may be useful in other contexts in which the covariance of Gaussian noise is not known in advance. We also give a general framework for finding all local optima of a function (given an oracle for approximately finding just one) and this is a crucial step in our algorithm, one that has been overlooked in previous attempts, and allows us to control the accumulation of error when we find the columns of A one by one via local search.
|
In this work we
revisit the topic modeling problem, which is used for automatic
comprehension and classification of data in a variety of settings. A
number of foundational works both in machine learning and in theory
have suggested a probabilistic model for documents, whereby documents
arise as a convex combination of (i.e. distribution on) a small number
of topic vectors, each topic vector being a distribution on words (i.e.
a vector of word-frequencies).
Theoretical studies of topic
modeling focus on learning the model's parameters assuming the data is
actually generated from it. Existing approaches for the most part rely
on Singular Value Decomposition(SVD), and consequently have one of two
limitations: these works need to either assume that each document
contains only one topic, or else can only recover the span of the topic
vectors instead of the topic vectors themselves.
This paper
formally justifies Nonnegative Matrix Factorization(NMF) as a main tool
in this context, which is an analog of SVD where all vectors are
nonnegative. Using this tool we give the first polynomial-time
algorithm for learning topic models without the above two limitations.
The algorithm uses a fairly mild assumption about the underlying topic
matrix called separability, which is usually found to hold in real-life
data.
We hope that this paper will motivate further theoretical
results that use NMF as a replacement for SVD - just as NMF has come to
replace SVD in many applications.
|
Computing
a Nonnegative Matrix Factorization --- Provably Arxiv
with Sanjeev Arora, Ravi Kannan and
Ankur
Moitra. In STOC
2012. [Abstract]
. The nonnegative matrix factorization
problem asks for factorizing an n*m matrix M into AW where A is n*r and
W is r*m, further the entries of the two matrices A and W are required
to be nonnegative. This problem has a extremely broad range of
applications, from machine learning to communication complexity. We
give algorithms for exact and approximate versions of the problem when
r is small (which is the case for most applications) and a hardness
result that shows our results cannot be substancially improved.
|
Finding
Overlapping Communities in Social Networks: Toward a
Rigorous Approach Arxiv
with Sanjeev Arora, Sushant Sachdeva and
Grant Schoenebeck, In EC 2012.[Abstract]
A community in a social network is usually understood to be a group of
nodes more densely connected with each other than with the rest of the
network. Many existing methods for finding them use heuristic
algorithms where the solution concept is defined in terms of an
NP-complete problem such as CLIQUE or HIERARCHICAL CLUSTERING. We
introduce a more formal approach whereby we make rigorous definitions
that sidestep the pitfall of high computational complexity and allow
efficient algorithms for finding communities.
|
Computational
Complexity and Information Asymmetry in
Financial Products PDF
with Sanjeev Arora, Boaz Barak and Markus
Brunnermeier. In ICS
2010[More Info]
For some frequently asked questions about the paper, see our FAQ page.
A short version appeared in Communications
of the ACM,
2011 Issue 5.
Full text PDF
See some blog comments on this paper: Intractability Center, Freedom
to Tinker, Boing
Boing, Daily
Kos, In
Theory, Healthy
Algorithms, Lipton's
blog
We put forward the issue of Computational
Complexity in the analysis of Financial Derivatives, and show that
there is a fundamental difficulty in pricing financial derivatives even
in very simple settings.
This is still a working paper. The latest version (updated Jan. 2012)
can be found here:
Computational
Complexity and Information Asymmetry in
Financial Products
|
We explore the possibility of better coloring algorithm using the new
concept threshold rank. We are able to analyze high level
Lasserre Relaxations for low threshold rank graphs and distance
transitive graphs, and our algorithm makes significantly better
progress
(towards O(log n) coloring) in these cases compared to previous
algorithms (where the number of colors is polynomial). We also purpose
a plausible conjecture, that if true would yield good sub-exponential
time algorithm for graph coloring.
|
We give new algorithms for a variety of randomly-generated instances of
computational
problems using a linearization technique that reduces to solving a
system of linear equations. Our techniques can be applied to the low
noise case of the well-known LWE
(Learning With Errors) problem, giving a slightly
subexponential time algorithm which suggests known hardness results are
almost tight.
|
We give a new algorithm for Simple Stochastic Games when the number of
RANDOM vertices is small. We also show that coarse approximation
is as hard as fine approximation for Simple Stochastic Games.
|
|
|
| Contact
|
| Email: |
rongge AT cs
DOT princeton DOT edu |
| Office: |
216 Computer
Science Building |
| Phone: |
(609) 258-5389
(office) |
| Mail: |
Department of
Computer Science
Princeton University
35 Olden Street
Princeton, NJ 08540-5233 |
|
|
|
|