Date and Time

Wednesday, November 6, 2013 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Host

Olga Troyanskaya

In genomic sciences, the amount of data has grown faster than statistical methodologies necessary to analyze those data. Furthermore, the complex underlying structure of these data means that simple, unstructured statistical models do not perform well. We consider the problem of identifying allelic heterogeneity, or multiple, functionally independent, co-localized genetic regulators of gene transcription. Sparse regression techniques have been critical to the discovery of allelic heterogeneity because of their computational tractability in large data settings. These traditional models are hindered by the substantial correlation between genetic variants induced by linkage disequilibrium. I describe a new model for Bayesian structured sparse regression. This model uses positive definite covariance matrices to incorporate the arbitrarily complex structure of the predictors directly into a Gaussian field to yield structure-aware sparse regression coefficients. This broadly applicable model of Bayesian structured sparsity enables more efficient parameter estimating techniques than models assuming independence would allow. On simulated data, we find that our approach substantially outperforms the state-of-the-art models and methods. We applied this model to a large study of expression quantitative trait loci, and found that our approach yields highly interpretable, robust solutions for allelic heterogeneity, particularly when the interactions between genetic variants are well approximated by an additive model.