Identification of Novel Structured RNAs Using Local Multiple Alignment and Homology Search
Zasha Weinberg
Yale University
Discoveries in the last 5-10 years have shown that structured, non-coding RNAs play a more significant role in biology than previously realized. To assist in discovering additional structured RNAs, we developed an automated comparative-genomics pipeline that can identify conserved, cis-regulatory RNA motifs within bacteria. I will describe our approach, and a homology search algorithm that is used by the pipeline. This algorithm speeds searches based on a type of stochastic context-free grammar called a covariance model. The algorithm achieves roughly 100-fold acceleration, and its accuracy is comparable to a highly tuned algorithm specialized to tRNAs. I will also present the results of our pipeline: the pipeline is able to recover most known RNAs in bacteria, and we have used it to identify 29 novel classes of structured RNAs, including 5 riboswitches supported by experimental evidence.
I will discuss the following work:
- Weinberg & Ruzzo (2006), "Sequence-based heuristics for faster annotation of non-coding RNA families", Bioinformatics, 22(1):35-9.
- Yao, Weinberg & Ruzzo (2006), "CMfinder--a covariance model based RNA motif finding algorithm", Bioinformatics, 22(4):445-52.
- Yao, Barrick, Weinberg, Neph, et al. (2007), "A Computational Pipeline for High-Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes", PLoS Comp Bio, 3(7):e126.
- Weinberg, Barrick, Yao, Neph, et al. (2007), "Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline"
|