Tissue Expression Profile Similarity Searches for Gene Discovery and Functional Prediction

Fabien Campagne

Department of Physiology and Biophysics, Weill Medical College, Cornell University


Our laboratory had previously developed TissueInfo, a computational approach to calculate tissue expression profiles for transcripts of a genome using information in a curated version of dbEST (see http://icb.med.cornell.edu/crt/tissueinfo/index.xml) [1, 2]. This talk will present an extension of TissueInfo to perform tissue expression profile similarity search/searches (TEPSS).

Expressed Sequence Tags (ESTs) have proven extremely useful to gene discovery efforts. Systems that organize EST data to facilitate data integration and mining generally also offer the ability to query for clusters of ESTs or for genes by expression pattern. When a pattern of tissue expression can be defined and associated with a phenotype of interest, filtering EST clusters or genes by their expression pattern may help prioritize gene candidates. However, defining an appropriate tissue expression pattern can be non-trivial and limits the use of this gene discovery strategy. We present and validate Tissue Expression Profile Similarity Search/Searches (TEPSS), a computational approach to identify transcripts that share similar tissue expression profiles to one or more transcripts in a group of interest. TEPSS offers tissue expression profile scoring methods (scorers) and a search engine. We evaluated various TEPSS scorers for their ability to discriminate between pairs of transcripts coding for interacting proteins (or proteins that participate in the same metabolic pathway) and non-interacting pairs.

We found that ordering protein-protein pairs by TEPSS score produces set of pairs significantly enriched in reported protein-protein interactions (reported interacting versus random pair, OR=157.57, 95% CI [36.81-375.51] at 1% coverage of a large set of reported interactions). The enrichment is also significant at 50% coverage (OR=4.73 95% CI [3.24-6.90]).

We used the TEPSS approach to prioritize SNO protein candidates by similarity to the tissue expression profile of known SNO proteins [3]. Preliminary analysis of the top candidates ranked by TEPPS suggests that the approach successfully identifies proteins that participate to SNO pathways. Application of TEPSS to gene discovery projects in the fields of cancer and neurodegenerative diseases will be discussed.

References
1.         Skrabanek L, Campagne F: TissueInfo: high-throughput identification of tissue expression profiles and specificity. Nucleic Acids Res 2001, 29(21):E102-102.
2.         Campagne F, Skrabanek L: Mining expressed sequence tags identifies cancer markers of clinical interest. BMC Bioinformatics 2006, 7:481.
3.         Hao G, Derakhshan B, Shi L, Campagne F, Gross SS: SNOSID, a proteomic method for identification of cysteine S-nitrosylation sites in complex protein mixtures. Proc Natl Acad Sci U S A 2006, 103(4):1012-1017.