Reconciliation is the process of resolving disagreement between gene and species trees, by invoking gene duplications and losses to explain topological incongruence. The resulting inferred duplication histories are a valuable source of information for a broad range of biological applications, including inferring the functions of newly discovered genes, estimating gene duplication times, and rooting and correcting gene trees. Reconciliation for binary trees is a tractable and well studied problem. However, a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and deep coalescence. We present the first formal algorithm for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Using a space efficient mapping from gene to species tree, our algorithm infers the minimum number of duplications and losses in O(|VG| (kS + hS)) time, where VG is the number of nodes in the gene tree, hS is the height of the species tree and kS is the width of its largest multifurcation. Our algorithms have been implemented in Notung, a robust, production quality tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.
Notung is freely available at http://www.cs.cmu.edu/~durand/Notung
References:
Reconciliation with Non-Binary Species Trees B. Vernot, M. Stolzer, A. Goldman, D. Durand. 2007. In Computational Systems Bioinformatics: CSB2007
Conference Proceedings, Imperial College Press: 441-452 A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction. D. Durand, B. V. Halldorsson, B. Vernot, 2006. Journal of Computational Biology, 13 (2): 320-335.
Notung: A Program for Dating Gene Duplications and Optimizing Gene Family Trees. K. Chen, D. Durand and M. Farach-Colton, 2000. Journal of Computational Biology, 7 (3/4), 429-447.