Learning Predictive Structures from Data using Supervised Dimensionality Reduction and Sparse Graphical Models

Irina Rish
IBM

In many applications of statistical learning the objective is not simply to construct an accurate predictive model but rather to discover underlying predictive structures in the data that have meaningful scientific interpretation. This is especially important in biological and neuroscientific applications such as, for example, reconstruction of brain-activation patterns from functional MRI (fMRI) data, or reverse-engineering of gene regulatory networks. In this talk we will discuss our recent work on two particular approaches to predictive structure discovery: supervised dimensionality reduction and sparse Markov network learning.

Supervised dimensionality reduction (SDR) combines dimensionality reduction with learning a predictive model by discovering a low-dimensional representation that captures information about the class while ignoring high-dimensional noise. Herein, we propose a general SDR-GLM framework [1] that views both features and class labels as exponential-family random variables (PCA-like dimensionality reduction is included as a particular case of Gaussian data) and learns data- and class-appropriate generalized linear models (GLMs). SDR-GLM handles both classification and regression, with both discrete and real-valued data, in a unifying framework. Besides its generality, the main advantage of our framework is its simplicity and computational efficiency: using appropriate auxiliary functions (lower bounds on the objective), we derive simple closed-form update rules that are used at each iteration of alternating minimization instead of calling optimization subroutines. Empirical results on synthetic high-dimensional data (with known ground-truth structure) demonstrate that SDR-GLM can discover underlying predictive low-dimensional representation yielding more accurate predictions that both SVM and SVDM, a state-of-art SDR method, and unsupervised DR followed by learning a predictor. On real-life applications including fMRI, proteomics and sensor networks data, again, SDR-GLM outperforms unsupervised DR by far, while either matching or outperforming SVM and SVDM.

Another type of structure one may want to extract from data is interactions/dependencies among the variables. Probabilistic graphical models, such as Markov networks (or Markov Random Fields), provide a model of multivariate data distributions that is both predictive and interpretable since the network encodes conditional independence relations. Particularly, sparse Markov network learning became a topic of active research in the past few years. Recently proposed $l_1$-regularized maximum-likelihood optimization methods for learning sparse Markov networks result into convex problems that can be solved optimally and efficiently. However, the accuracy of such methods can be very sensitive to the choice of regularization parameter, and optimal selection of this parameter remains an open problem. Herein, we propose a Bayesian approach [2] that investigates the effects of using a prior on the regularization parameter. We advocate using non-uniform prior and present encouraging empirical results on both synthetic and real-life applications such as brain imaging data (fMRI). Our method compares favorably to the previous approaches, often resulting into higher accuracy and a more balanced trade-off between false-positives vs false-negatives.

[1] Irina Rish, Genady Grabarnik, Guillermo Cecchi, Francisco Pereira, Geoffrey J. Gordon (2008). Closed-form Supervised Dimensionality Reduction with Generalized Linear Models. In Proceedings of ICML 2008, Helsinki, Finland.

[2] A Bayesian Approach to Learning Sparse Gaussian Markov Networks. Narges Bani Asadi, Irina Rish, Katya Scheinberg, Dimitri Kanevsky and Bhuvana Ramabhadran. Submitted, 2008. (currently available as IBM Technical Report) Submitted, 2008. (currently available as IBM Technical Report)