Learning of predictive models of global transcriptional dynamics with application to the extreemophile Halobacterium NRC-1.

Richard Bonneau

Asst. Prof Dept Biology/Computer Science, New York University

Our system for network inference and modeling consists of three major components: cMonkey (a method for learning co-regulated biclusters and pathways), the Inferelator (regulatory network inference) and the Gaggle (a system for visualizing and managing the results of the analysis as well as the input data). These three components have been described individually, we have also show their integration into a functioning integrated system by applying them to Halobacterium and several other model organisms. This effort represents on of the first coordinated functional genomics effort in archaea and in particular, under hypersaline conditions.

cMonkey: We have developed an integrative biclustering algorithm, cMonkey, which groups genes and conditions into biclusters on the basis of 1) coherence in expression data across subsets of experimental conditions, 2) co-occurrence of putative cis-acting regulatory motifs in the regulatory regions of bicluster members and 3) the presence of highly connected sub-graphs in metabolic and functional association networks. We describe the algorithm and the results of extensive tests of several previously described methods, showing that cMonkey has several advantages in the context of regulatory network inference.

 The Inferelator: We have described a network inference algorithm, the Inferelator, which infers regulatory influences for genes and/or gene clusters from mRNA and/or protein expression levels. The procedure can simultaneously model equilibrium and time-course expression levels, such that both kinetic and equilibrium expression levels may be predicted by the resulting models. Through the explicit inclusion of time, and gene-knockout information, the method is capable of learning causal relationships. It also includes a novel solution to the problem of encoding interactions between predictors. We discuss the results from an initial application of this method to the halophilic archaeon, Halobacterium NRC-1. We have found the network to be predictive of 130 newly collected microarray datasets and have also validated parts of the network using ChIP-chip. This network offers a means of deciphering how this organism maintains homeostasis and responds to wide varieties of metabolic, genetic and environmental states.

For Background on our integrated system see:

http://genomebiology.com/2006/7/5/R36

http://www.biomedcentral.com/1471-2105/7/280

http://www.biomedcentral.com/1471-2105/7/176