Integrating Genomic Data to Build Networks for Proteins and Small Molecules | Computer Science Department at Princeton University

Report ID:

TR-965-13

Authors:

Bell, Ana

Date:

September 2013

Pages:

127

Download Formats:

[PDF]

Abstract:

Advances in high-throughput genome-wide sequencing technologies have generated a massive amount of genomic data. Coupled with the ever-increasing performance of computing technologies, there is potential for a revolution in our knowledge of biology; hence the emergence of computational biology. A key goal of computational biology is to understand and model how biological processes work and to apply this knowledge to resolve complex human diseases. To that end, this thesis represents the work on two separate advances in network-based analysis of the large compendium of genomic data. We apply our knowledge of algorithms to the genomic data available in order to (1) build tissue and development specific gene interaction networks and (2) understand drug action on the molecular level. The difficulties inherent in sequencing and functionally analyzing biologically and economically significant organisms have recently been overcome. Arabidopsis thaliana, a versatile model organism, represents an opportunity to evaluate the predictive power of biological network inference for plant functional genomics. Functional relationship networks are powerful tools that enable rapid investigation of uncharacterized genes. We provide a compendium of functional relationship networks for A. thaliana, leveraging data integration based on microarray, physical and genetic interaction, and literature curation datasets. To our knowledge this is the first work that includes tissue, biological process, and development stage specific networks, each predicting relationships specific to an individual biological context. These networks summarize a large collection of A. thaliana data for biological examination. We found validation in the literature for many of our predicted interactions.

Functional networks and network-level pathway models thus represent an accurate and sensitive summary of the processes happening in the cell. In the second part of this thesis, we use these models to understand drug action. We integrate large amounts of heterogeneous data and build pathway-level networks that present interactions between compounds and proteins. We test our methodology in Saccharomyces cerevisiae (yeast). Our two step integration process, where we first predict protein-protein interaction networks for various protein-protein interaction types and then use these networks to predict protein-compound interaction networks, provide detailed insight into how pathway level knowledge can be leveraged to predict compound-level interactions.