Quick links

Targeted analyses of very large genome-wide data collections

Report ID:
February 18, 2016
Download Formats:


Genome-scale experiments provide an overwhelming amount of molecular information
for biologist. New computational methods are needed for specific analysis and
interpretation of such high-dimensional data. Here we take advantage of the massive
public repositories to quantify the tissue-specific signals in gene expression profiles,
characterize distinctive molecular features of human diseases, deconvolve the latent
cell-type-specific factors in mixed clinical samples, and automatically integrate heterogeneous
data sources in the context of a specific genome-wide dataset. First, we
describe URSA (Unveiling RNA Sample Annotation) that incorporates the known
tissue/cell-type relationships to better estimate the specific signal in any given gene
expression profile. Our ontology-aware method combines independent discriminative
classifiers in a Bayesian framework, outperforming other machine learning methods.
We provide a molecular interpretation for the tissue and cell-type models learned
by URSA, enabling a data-driven view of molecular processes specific to particular
tissues and cell types. Then, we extend this work for human diseases. We use thousands
of clinical disease-specific expression profiles in public repositories to quantify
distinctive functional and anatomical characteristics of human diseases. Through our
data-driven analysis, we explore the complexity of the human disease landscape and
propose exploratory hypothesis for drug repurposing even for rare disease with no
prior genetic knowledge. Lastly, we describe YETI (Your Evidence Tailored Integration)
for targeted integration of heterogeneous genome-wide data sources. Biomedical
researchers generate genome-wide datasets for data-driven exploration of specific
questions but such analyses are disconnect from big public data collections. YETI is
the first automatic integration method that effectively constructs functional networks
specific to a genome-scale dataset. We show that the resulting integration reflect the
biological context of the user-provided dataset while providing accurate prediction
for functional interactions.

Follow us: Facebook Twitter Linkedin