Integration and Functional Analysis of Microarray Data Sets

Curtis Huttenhower

Computer Science Department, Princeton University

 

Microarray data sets examining a variety of organisms, tissue types, and cellular conditions are now widely available, representing a rich opportunity for biological data mining.  While many methods are available for analyzing individual data sets, extracting biological information from multiple data sets simultaneously has the potential to yield a broader, more holistic picture of the biological functionality captured by coexpression.  We have developed a Bayesian approach referred to as the Microarray Experiment Functional Integration Technology (MEFIT) for rapidly analyzing many microarray data sets in tandem.  MEFIT predicts functionally related gene pairs, each in the context of a specific biological function; gene pairs are predicted to interact with some probability within each biological function of interest (as specified by a biologist or by existing catalogues such as the Gene Ontology).  Furthermore, MEFIT produces a relevance score for each input data set indicating how predictive it is of each biological function of interest.  We discuss the ramifications of this function-specific evaluation and present a simpler clustering algorithm, Nearest Neighbor Networks (NNN), able to produce results with high precision and similar functional breadth within individual data sets.