Deciphering Disease Genomes in a Network Context | Computer Science Department at Princeton University

Report ID:

TR-008-19

Authors:

Hristov, Borislav

Date:

August 29, 2019

Pages:

Download Formats:

[PDF]

Abstract:

Despite the incredible influx of sequencing data, pinpointing the gene variants responsible for the development of heterogeneous diseases remains a particularly hard
task because the same phenotypic outcome (disease) can result from a myriad of
combinations of different alterations across the genome. A promising avenue is to
consider genome alterations within the context of pathways instead of genes because
different alterations within any of several genes comprising the same pathway can
have similar consequences with respect to disease development. Large-scale biological networks provide a helpful proxy for biological pathway knowledge as genes that
participate in the same pathway tend to interact with each other and form modules
within the larger network. In this dissertation, I introduce two novel methods that
further our ability to computationally highlight potential disease-causing genes by
examining disease genomes in the context of biological networks.
First, in Chapter 2, I present a novel network-based approach which tackles cancer
mutational heterogeneity by utilizing per-individual mutational profiles. I provide an
intuitive formulation relying on balancing the size of a connected subgraph within the
larger network with covering many patients. I describe a machine learning-like schema
for selecting the value of the single required parameter and both an integer linear
programming framework and a fast heuristic for optimizing the objective function. I
demonstrate the outstanding performance of my method in identifying cancer-relevant
genes, especially those mutated at very low rates.
Next, in Chapter 3, I propose a general computational framework that uses prior
knowledge of disease-associated genes to guide a network-based search for novel ones
based upon newly acquired information. I use a graph diffusion kernel to spread the
signal from the set of already known disease genes and then use it to bias a random
walk originating from the newly implicated genes to move closer to the known ones.
I demonstrate that integrating the two types of information is better than using
iii
either one of them alone. I show, in the context of cancer, that my method readily
outperforms other network-based methods. Finally, I apply my approach to several
complex diseases, thereby demonstrating its versatility in a broad range of settings.