Bioinformatics of protein domains: new computational approaches for the detection of protein domains and their interactions

Maricel Kann

Biological Sciences, University of Maryland

With complete genome sequencing now routine, biology faces (at least) two fundamental problems: the large-scale automatic annotation of gene function, and the reconstruction of the protein interaction networks. The most powerful approach for inferring function of new protein sequences is the transfer of annotation from similar proteins using sequence comparison methods. Because proteins are composed of basic units called domains, a gene can be annotated using a domain database by aligning domains to the gene's protein sequence. In the past, the lack of accurate statistics for expected scores generated by the semi-global alignment tools has hampered such studies. I will introduce a new approach for semi-global alignment (GLOBAL) that I developed at the NCBI which, contrary to other semi-global tools, provides extremely accurate score statistics. The heuristic acceleration used in other local alignment tools can be implemented into GLOBAL to further increase the search speed, making it an ideal tool for high-throughput protein domain searches with accurate p-values. To address the second problem,if time allows, I will introduce a novel computational approach for prediction of protein domain interactions from co-evolution of conserved regions.

Selected publications:

Kann MG: Protein interactions and disease: computational approaches to uncover the etiology of diseases. Brief Bioinform 2007, 8(5): 333-346.

Kann MG, Jothi R, Cherukuri PF, Przytycka TM: Predicting protein domain interactions from coevolution of conserved regions. Proteins 2007, 67(4):811-820.

Kann MG, Sheetlin SL, Park Y, Bryant SH, Spouge JL: The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. Nucleic Acids Res 2007, 35(14):4678-4685.