ClustalW is based on ClustalV and contains some improvements. The approach used in
ClustalV is a modified version of the method of Feng and Doolittle (1987) who aligned
the sequences in larger and larger groups according to the branching order in an initial
phylogenetic tree. This approach allows a very useful combination of computational
tractability and sensitivity.
The positions of gaps that are generated in early alignments remain through later
stages. This can be justified because gaps that arise from the comparison of closely
related sequences should not be moved because of later alignment with more distantly
related sequences. At each alignment stage, you align two groups of already aligned
sequences. This is done using a dynamic programming algorithm where one allows the
residues that occur in every sequence at each alignment position to contribute to the
alignment score. A Dayhoff (1978) PAM matrix is used in protein comparisons.
The details of the algorithm used in ClustalV have been published in Higgins and
Sharp (1989). This was an improved version of an earlier algorithm published in Higgins
and Sharp (1988). First, you calculate a crude similarity measure between every pair of
sequence. This is done using the fast, approximate alignment algorithm of Wilbur and
Lipman (1983). Then, these scores are used to calculate a "guide tree"
or dendrogram, which will tell the multiple alignment stage in which order to align the
sequences for the final multiple alignment. This "guide tree" is
calculated using the UPGMA method of Sneath and Sokal (1973). UPGMA is a fancy name for
one type of average linkage cluster analysis, invented by Sokal and Michener
(1958).
Having calculated the dendrogram, the sequences are aligned in larger and larger
groups. At each alignment stage, we use the algorithm of Myers and Miller (1988) for the
optimal alignments. This algorithm is a very memory efficient variation of Gotoh's
algorithm (Gotoh, 1982). It is because of this algorithm that ClustalV can work on
microcomputers. Each of these alignments consists of aligning 2 alignments, using what
we call "profile alignments".