Quick links

Computational approaches for the DNA sequencing data deluge

Date and Time
Tuesday, March 6, 2012 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Second-generation DNA sequencers are improving rapidly and are now capable of sequencing hundreds of billions of nucleotides of data in about a week for a few thousand dollars. Consequently, sequencing has become a common tool in many fields of life science. But with these developments comes a problem: growth in per-sequencer throughput is drastically outpacing growth in computer speed. As the throughput gap widens over time, the crucial research bottlenecks are increasingly computational: computing, storage, labor, power.

Along these lines, I will discuss collaborative scientific projects in epigenetics and gene expression profiling for which I provided novel computational methods in areas such as read alignment, text indexing, and data-intensive computing. I will also discuss a new set of methods for very time- and space-efficient alignment of sequencing reads: Bowtie and Bowtie 2. These tools build on the insight that the Burrows-Wheeler Transform and the FM Index, previously used for data compression and exact string matching, can be extended to facilitate fast and memory-efficient alignment of DNA sequences to long reference genomes such as the human genome.

Ben Langmead is a Research Associate in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. He completed his Ph.D. in Computer Science in February 2012 at University of Maryland, advised by Steven L. Salzberg. His research addresses problems at the intersection of computer science and genomics, and he is the author of several open source software tools for analysis of high-throughput genomics data, including Bowtie, Bowtie 2, Crossbow and Myrna. His paper describing Bowtie won the Genome Biology award for outstanding paper published in 2009. At Johns Hopkins, he collaborates with biostatisticians, biomedical engineers, biologists, and other computer scientists to develop methods for analyzing second-generation DNA sequencing data.

Follow us: Facebook Twitter Linkedin