Genomic data integration for regulatory and functional network inference

Curtis Huttenhower
Lewis-Sigler Institute for Integrative Genomics, Princeton University

Metazoan genomic data of many types are readily available, but the complexity and scale of molecular biology in multicellular systems make it difficult to integrate this body of data, understand it from a systems level, and apply it to the study of specific pathways or genetic disorders. An investigator could best explore a particular protein, pathway, or disease if given a functional map summarizing the data and interactions most relevant to his or her area of interest. I will discuss two probabilistic data integration systems for extracting useful biological information from very large genomic data collections in this manner. The first, HEFalMp, focuses on human beings, and provides maps of functional activity and interactions in over 200 areas of human cellular biology, each including information from ~30,000 genome-scale experiments pertaining to ~25,000 human genes. I will also discuss COALESCE, which takes advantage of Bayesian integration of multiple data types on a large scale to predict coregulated gene modules, the conditions under which they are coregulated, and the consensus binding motifs responsible for their regulation. Functional maps and regulatory modules thus represent two of the many systems-level areas which are becoming increasingly accessible through large scale genomic data analysis.