Algorithms for Analyzing and Interrogating Protein Interaction Networks (thesis)
High-throughput experimental and computational approaches are facilitating the collection of large-scale biological networks,
consisting of proteins and the interactions among them, for a growing number of species. With appropriate computational
analysis and experimental work the potential exists for uncovering the organizational principles of the cell and, consequently,
protein functions and pathways, which are still largely unidentified. In this work, we introduce a novel framework for analyzing
protein interaction networks in order to uncover organizational units corresponding to recurring means with which diverse
biological processes are carried out. We formalize recurring patterns of interaction among different types of proteins using
“network schemas”; network schemas specify descriptions of proteins and the topology of interactions among them.
In the first part of this thesis, we develop algorithms for systematically uncovering recurring, over-represented schemas in
physical interaction networks and apply these methods to the S. cerevisiae interactome, identifying hundreds of such
organizational units of varying complexity. We establish the functional importance of these schemas by showing that they
correspond to functionally cohesive sets of proteins, are enriched in the frequency with which they have instances in the
H. sapiens interactome, and are useful for predicting protein function. In the second part of this thesis, we introduce NetGrep,
a system for searching protein interaction networks for matches to more general network schemas. NetGrep provides an
advanced graphical interface for specifying schemas and fast algorithms for extracting their matches.