Groupe d’études et de recherche en analyse des décisions

G-97-18

BIONJ: An Improved Version of the NJ Algorithm Based on a Simple Model of Sequence Data

We propose an improved version of the neighbor-joining algorithm (NJ) of Saitou and Nei (1987), which is one of the most popular distance methods to reconstruct phylogenies. This new algorithm, BIONJ, follows the same agglomerative scheme as NJ, which consists in iteratively picking a pair of taxa, creating a new node which represents the cluster of these taxa, and reducing the distance matrix by replacing both taxa by this node. Moreover, BIONJ uses a simple, first-order model of the variances and covariances of evolutionary distance estimates. At each step, this model permits the selection of the reduction which minimizes the variance of the new distance matrix, from the class of admissible reductions. In this way, we obtain better estimates to choose the pair of taxa to be agglomerated during the next steps. Moreover, in comparison with NJ's estimates, these estimates become better and better as the algorithm proceeds. BIONJ retains the good properties of NJ - especially its low run time. Computer simulations have been preformed, with 12-taxon model trees, to determine BIONJ's efficiency. When the substitution rates are low (maximum pairwise divergence A 0.1 substitutions per site) or when they are constant among lineages, BIONJ clearly has a better topological accuracy. Then, for the model trees and the conditions of evolution tested, the topological error reduction is on the average around 20%. With highly varying-rate trees and with high substitution rates (maximum pairwise divergence A 1.0 substitutions per site), the error reduction may even rise above 50%, while the probability of finding the correct tree may be augmented as much as 15%.

, 35 pages