Graphical models (GMs) offer a powerful language to elegantly define expressive distributions, and a generic computational framework to support reasoning under uncertainty in a wide range of problems. Popular paradigms for training GMs include the maximum likelihood estimation, and more recently the max-margin learning, each enjoys some advantages, as well as weaknesses. For example, the maximum margin structured prediction model such as M3N lacks a straightforward probabilistic interpretation of the learning scheme and the prediction rule. Therefore its unique advantages such as support vector sparsity and kernel tricks cannot be easily conjoined with the merits of a probabilistic model such as Bayesian regularization, model averaging, and ability to model hidden variables.
In this talk, I present a new general framework called Maximum Entropy Discrimination Markov Networks (MEDN), which integrates the margin-based and likelihood-based approaches and combines and extends their merits. This new learning paradigm naturally facilitates integration of the generative and discriminative principles under a unified framework, and the basic strategies can be generalized to learn arbitrary GMs, such as the generative Bayesian networks, models with structured hidden variables, and even nonparametric Bayesian models, with a desirable maximum margin effect on structured or unstructured predictions. I will discuss a number of theoretical properties of this approach, and show applications of MEDN to learning a wide range of GMs including: fully supervised structured i/o model, max-margin structured i/o models with hidden variables, a max-margin LDA-style model for jointly discovering “discriminative” latent topics and predicting document label/score of text documents, or total scene and objective categories in natural images, etc. Our empirical results strongly suggest that, for any GM with structured or unstructured labels, MEDN always leads to a more accurate predictive GM than the one trained under either MLE or Max Margin.
Joint work with Jun Zhu.
Dr. Eric Xing is an associate professor in the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for solving problems involving automated learning, reasoning, and decision-making in high-dimensional and dynamic possible worlds; and for building quantitative models and predictive understandings of biological systems. Professor Xing received a Ph.D. in Molecular Biology from Rutgers University, and another Ph.D. in Computer Science from UC Berkeley. His current work involves, 1) foundations of statistical learning, including theory and algorithms for estimating time/space varying-coefficient models, sparse structured input/output models, and nonparametric Bayesian models; 2) computational and statistical analysis of gene regulation, genetic variation, and disease associations; and 3) application of statistical learning in social networks, computer vision, and natural language processing. Professor Xing has published over 140 peer-reviewed papers, and is an associate editor of the Annals of Applied Statistics, the PLoS Journal of Computational Biology, and an Action Editor of the Machine Learning journal. He is a recipient of the NSF Career Award, the Alfred P. Sloan Research Fellowship in Computer Science, and the United States Air Force Young Investigator Award.