BiBiServ2 - ClustalW

ClustalW is based on ClustalV and contains some improvements. The approach used in ClustalV is a modified version of the method of Feng and Doolittle (1987) who aligned the sequences in larger and larger groups according to the branching order in an initial phylogenetic tree. This approach allows a very useful combination of computational tractability and sensitivity.

The positions of gaps that are generated in early alignments remain through later stages. This can be justified because gaps that arise from the comparison of closely related sequences should not be moved because of later alignment with more distantly related sequences. At each alignment stage, you align two groups of already aligned sequences. This is done using a dynamic programming algorithm where one allows the residues that occur in every sequence at each alignment position to contribute to the alignment score. A Dayhoff (1978) PAM matrix is used in protein comparisons.

The details of the algorithm used in ClustalV have been published in Higgins and Sharp (1989). This was an improved version of an earlier algorithm published in Higgins and Sharp (1988). First, you calculate a crude similarity measure between every pair of sequence. This is done using the fast, approximate alignment algorithm of Wilbur and Lipman (1983). Then, these scores are used to calculate a "guide tree" or dendrogram, which will tell the multiple alignment stage in which order to align the sequences for the final multiple alignment. This "guide tree" is calculated using the UPGMA method of Sneath and Sokal (1973). UPGMA is a fancy name for one type of average linkage cluster analysis, invented by Sokal and Michener (1958).

Having calculated the dendrogram, the sequences are aligned in larger and larger groups. At each alignment stage, we use the algorithm of Myers and Miller (1988) for the optimal alignments. This algorithm is a very memory efficient variation of Gotoh's algorithm (Gotoh, 1982). It is because of this algorithm that ClustalV can work on microcomputers. Each of these alignments consists of aligning 2 alignments, using what we call "profile alignments".