 |
|
|
|
In the last section we made some experiences with
pairwise alignment tools. Those tools are good to look
for global and local similarities between only two
sequences. This way you are able to find preserved
regions distributed over the whole sequence. If we want
to align more than two sequences we have to use multiple
alignment tools. Methods that generalize the pairwise
dynamic programming approach to multiple alignments are
limited to small numbers of short sequences, as the
problem is uncomputable for much more than 10 or so
proteins of average length. Therefore, all of the
methods below make use of heuristics.
Each tool has different restrictions in handling
sequence data. In principle it is better to align only
sequences of equal lengths, which means the shortest and
longest sequence should not differ more than one hundred
bases in length.
A really big advantage of multiple alignment tools is
that you see global similarities between several
sequences much better than within lots of pairwise
sequence alignments on several sheets of paper spread
all over your desk.
|
 |
|
ClustalW [ Thompson et
al. 1994] is the most widely known
multiple sequence alignment tool for DNA or proteins. It
uses the fact the homologous sequences are
evolutionarily related and builds up the alignment
progressively by a series of pairwise alignments,
following the branching order in a phylogentic tree. The
most closely related sequences are aligned first, and
then the more distant ones are added gradually.
|
 |
|
Divide-and-Conquer Multiple
Sequence Alignment ( DCA) [ Stoye et
al. 1997] is a program for producing
fast, high quality simultaneous multiple sequence
alignments of amino acid, RNA, or DNA sequences. The
program is based on the DCA algorithm, a heuristic
approach to sum-of-pairs (SP) optimal alignment.
|
 |
|
While standard alignment
methods rely on comparing single residues and imposing
gap penalties, DIALIGN [ Morgenstern
1999] constructs pairwise and multiple
alignments by comparing whole segments of the
sequences. No gap penalty is used. This approach is
especially efficient where sequences are not globally
related but share only local similarities, as is the
case with genomic DNA and with many protein
families.
|
|
|
MSA implements the Carillo-Lipman heuristic.
|
 |
|
In the exercises you have used
three different alignment programs to align the same set
of sequences. Which one generates the "best" alignment?
This question is difficult to answer, as we do not know
the evolutionary history of the sequences. Actually we
try to reconstruct the history from the sequences we can
obtain today. BAliBASE [ Thompson et
al. 1999] is a database of
manually-refined multiple sequence alignments
specifically designed for the evaluation and comparison
of multiple sequence alignment programs. The alignments
are categorised by sequence length, similarity, and
presence of insertions and N/C- terminal extensions.
|
|
|