Summary of Previous Research Projects
In my first several projects as a graduate student, I investigated the molecular mechanisms by which proteins accomplish their astonishing functional diversity and specificity. Interactions with other molecules play a crucial role in nearly all protein functions. These interactions are usually mediated by a small number of amino acid positions in the protein, so-called functionally important sites.
-
Predicting functionally important residues from sequence conservation.
Capra JA and Singh M.
Bioinformatics, 23(15): 1875-82, 2007.
[Paper] [Supporting Code and Data]
We developed a fast, information-theoretic method for estimating sequence conservation from a multiple sequence alignment of homologs--one of the most common methods for identifying functionally important sites. Our approach provides state-of-the-art performance in several orders of magnitude less time than the previous best performing methods. By grounding the evaluation of these conservation estimation algorithms in real-world prediction tasks, we found significant variation in the methods' performance in a range of realistic settings.
- Characterization and prediction of residues determining protein functional specificity.
Capra JA and Singh M.
Bioinformatics, 24(13): 1473-1480, 2008.
[Paper] [Supporting Code and Data]
My next project considered the prediction of a type of functionally important site that cannot be identified by considering evolutionary conservation alone: those that determine substrate specificity within a family of homologous proteins. We combined sequence, structure, and experimental data to build the first large dataset of such positions. This dataset enabled the characterization of specificity determining positions and the evaluation of the many sequence-based methods created for this problem. We found significant differences between the physico-chemical properties of SDP and other functional sites, and demonstrated that a simple method we developed provides state-of-the-art performance.
- Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure
Capra JA, Singh M, and Funkhouser TA.
Submitted.
Next, I focused on predicting small molecule binding sites by integrating data from protein sequence and structure. ConCavity, the resulting method which directly combines estimates of sequence conservation with structure-based surface pocket finding, provides significant improvement over previous methods. ConCavity makes very specific predictions both of 3D pockets which are likely to contain ligands and of protein residues that are likely to bind ligands. Across a diverse set of structures, ConCavity's first predicted residue is in contact with a bound ligand for nearly 80% of proteins. This project yielded several additional insights about the relationship of sequence conservation, structure, and function.
- G-quadruplex DNA sequences are evolutionarily conserved and associated with distinct genomic features in Saccharomyces cerevisiae.
Capra JA*, Paeschke K*, Singh M, and Zakian VA.
Submitted. * co-first authors
- The Integration of New Proteins into Protein Interaction Networks.
Capra JA and Singh M.
In Preparation.