The Engelhardt Group is involved in developing innovative statistical models and methods in order to elucidate biological mechanisms of complex phenotypes and disease. Measurements of biological systems have both noise and systematic bias, and often the analytical goal is to identify low-dimensional substructure within a high-dimensional space. These qualities are well-addressed by model-based analyses. But the high dimension and scale of biological data makes parameter estimation in sophisticated models challenging. We address these challenges by developing hierarchical statistical models and approximate parameter estimation methods to gain access to interesting biological phenomena.
Epistatic QTLs. Although it is straightforward to determine
whether a SNP impacts transcription of a gene, it is less clear how to
test whether a SNP regulates transcription of a gene differently in
the presence of a chemical modifier. With collaborators from the
Childrens Hospital Oakland Research Institute (CHORI), I am applying a
Bayesian test based on regression with multiple correlated responses
to determine whether statins change how a SNP modulates
transcription. Currently we have found several differential eQTLs
affecting genes in a cholesterol pathway, along with thousands of
eQTLs; one differential eQTL was shown to be protective of a toxic
side effect of statins in two clinical cohorts. We are currently
developing methods for considering different types of epistasis beyond
Publications: [Mangravite, Engelhardt, et al., 2013]
In order for SNPs associated with complex traits and disease to be medically actionable, it is essential that we understand how they work. As part of the GTEx consortium, and in collaboration with Casey Brown, we conducted large-scale replication studies across eleven studies in seven tissue types. We have overlaid these results onto regulatory element data to enable a much more profound mechanistic understanding of eQTL data by studying where eQTLs and cell type specific eQTLs are co-located with specific cis-regulatory elements. In collaboration with Tim Reddy, we studied long intergenic non-coding RNA (lincRNA) and, using protein-coding RNA as a control, we found no evidence that lincRNA ubiquitously affect gene transcription, in contrast to their protein-coding counterparts.
We are currently developing statistical models for understanding eQTLs and variants that influence mRNA isoform levels in RNA-seq data. We are also working on predictive models for eQTLs across tissue types and models that consider replication in trans-eQTLs.
Publications: [Brown, Mangravite, Engelhardt 2013], [McDowell et al. 2015]
Sparse latent factor models applied to genomic data have the ability to recover interpretable latent linear structure. Applied to genotype data from individuals with discrete population structure, we can recover the underlying ancestral populations; applied to individuals with continuous population structure, we find a recapitulation of their geographic ancestry.
We developed latent factor models for application to gene expression
data, adapting flexible continuous sparsity-inducing priors to support
an overcomplete represetation and recovering a large number of sparse
latent components. We also added a two component mixture model to
support recovery of non-sparse, low rank structure, which captures
variance effects due to confounding such as population structure and
technical effects. Using this general framework, we have developed
canonical correlation analysis and group factor analysis models to
jointly reduce dimension across multiple data observations (e.g.,
genotype and gene expression data) and biclustering models with
sparsity on both the genes and the samples. By interpreting the latent
structure as regularized covariance matrix estimation, we build
ubiquitous, subset specific, and subset differential Gaussian
graphical models (Gaussian Markov random fields, gene co-expression
We have validated these approaches by recovering trans-eQTLs that cannot be detected using standard methods. We are extending this work in a number of ways.
Publications: [Engelhardt & Stephens, 2010], [Gao, Brown, Engelhardt 2013], [Zhao et al. 2014], [Srivastava, Engelhardt, Dunson 2014], [Gao et al. 2014]
Publications: [Zhang et al. 2015]