Analysis and Visualization of Large-Scale Gene Expression Microarray Compendia (thesis)
Over the past decade, gene expression microarray data has become one of the most important tools available for biologists to understand molecular processes and mechanisms on the whole-genome scale.
Microarray data provides a window into the inner workings of the transcriptional process that is vital for cellular maintenance, development, biological regulation, and disease progression. While an exponentially increasing amount of microarray data is being generated for a wide variety of organisms, there is a severe lack of methods designed to utilize the vast amount of data currently available. In my work, I explore several techniques to meaningfully harness large- scale collections of microarray data both to provide biologists with a greater ability to explore data repositories, and to computationally utilize these repositories to discover novel biology.
First, effective search and analysis techniques are required to guide researchers and enable their effective use of large-scale compendia.
I will present a user-driven similarity search algorithm designed to both quickly locate relevant datasets in a collection and to then identify novel players related to the user’s query. Second, I will discuss techniques for visualization-based analysis of microarray data that incorporate statistical measures into visualization schemes and utilize alternative views of data to reveal previously obscure patterns. Third, I will focus on novel methods that allow users to simultaneously view multiple datasets with the goal of providing a larger biological context within which to understand these data.
Finally, I will discuss how we have successfully used these approaches to discover novel biology, including successfully directing a large-scale experimental investigation of S. cerevisiae mitochondrial organization and biogenesis.
he combination of visualization-based analysis methods and exploratory algorithms such as those presented here are vital to future systems biology research. As data collections continue to grow and as new forms of data are generated, it will become increasingly important to develop methods and techniques that will allow experts to intelligently sift through the available information to make new discoveries.