Improving Mathematical Programming Approaches for Motif Finding
The motif finding problem is to locate a collection of mutually similar subsequences within a given set of DNA sequences. This is an important problem, as such shared motifs often correspond to regulatory elements. We study a combinatorial framework for the motif finding problem, where the goal is to find a minimum (or maximum) weighted clique in a k-partite graph. Previous approaches to find these cliques have relied on graph pruning and divide-and-conquer techniques. Recently, it has been shown that mathematical programming is a promising approach for motif finding. Here, we describe a novel integer linear programming formulation for the problem. A key observation driving our formulation is that the weights on the edges in the graph come from a small set of possibilities. We show that our new formulation leads to a method that is highly effective in practice on instances arising from biological sequence data. We are able to solve these problems to optimality often many times faster than the existing mathematical programming approach.