Multiple Sequence Alignment

Next: Pairwise Sequence Alignment Up: Examples Previous: Examples

Multiple Sequence Alignment

Suppose you want to align three strains auf S. aureus. You follow instructions a) in Section 3 and build an index named 3staph:

mkvtree -dna -lcp -suf -tis -db NC_002745.fna NC_002758.fna MRSA.dbs\
 -indexname 3staph

You are now able to run mga and create your alignments. You decide to turn on verbose output and to always recurse into gaps. Using these two options is always a good idea for multiple sequences, though in this example, the second one does not make a difference. The output will be written to files whose names start with example.

mga -v -l 1000 20 -always -o example 3staph

The file example.summary contains the command line parameters of mga. Also, the description of the sequences as present in their FASTA headers is given. Then, some simple statistics follow: more than 90% of the sequences are aligned.

The program call arguments were

-v -l 1000 20 -always -o example 3staph 

Sequence description:

Seq 1: gi|15925705|ref|NC_002745.1| Staphylococcus aureus subsp. aureus N315, \
complete genome
Seq 2: gi|15922990|ref|NC_002758.1| Staphylococcus aureus strain Mu50, \
complete genome
Seq 3: Staphylococcus aureus (EMRSA-16) chromosome 2,902,619 bp

Number of matches / aligned / unaligned gaps is 23054 / 22960 / 93

        length / all al. = matches + aligned / unaligned gaps

Seq 1: 2813641 / 2654636 = 2348504 +  306132 /  159005
Seq 2: 2878040 / 2652323 = 2348504 +  303819 /  225717
Seq 3: 2902619 / 2653076 = 2348504 +  304572 /  249543

Avg. : 2864766 / 2653345 = 2348504 +  304841 /  211421

Coverage in percent

Seq 1: 2813641 /    94.3 =    83.5 +    10.9 /     5.7
Seq 2: 2878040 /    92.2 =    81.6 +    10.6 /     7.8
Seq 3: 2902619 /    91.4 =    80.9 +    10.5 /     8.6

Avg. : 2864766 /    92.6 =    82.0 +    10.6 /     7.4

Please cite the following paper:

            Michael Hoehl, Stefan Kurtz, Enno Ohlebusch
            Efficient Multiple Genome Alignment
            Bioinformatics, Vol. 18 (S1):S312-S320, 2002

If mga is run as above the file example.align contains only the structure of the alignment, not the (aligned) sequence data itself. You can see a multiMEM in the first line, starting at position zero in all three sequences, then a gap which is five bases short, etc.:

61 0 0 0
5:61-65 5:61-65 5:61-65 
85 66 66 66
72:151-222 71:151-221 72:151-222 
131 223 222 223

When mga is run with the additional parameter -match it outputs the sequence data of matches. Recall that a match consists of bases that are identical in all sequences. Therefore, the bases are shown only once. The last line from the previous example is displayed as follows:

131 223 222 223
Exact: attagaaattacacacaaagttatactatttttagcaacatattcacaggtatttgacat        60
Exact: atagagaactgaaaaagtataattgtgtggataagtcgtccaactcatgattttataagg       120
Exact: atttatttatt                                                        131

When your are only interested in, say, 20 bases next to a gap, you can get this context information by supplying mga with parameter -bl 20 instead:

131 223 222 223
Exact: attagaaattacacacaaag                                                20
       ...skipping bases...
Exact: tttataaggatttatttatt                                                20

You might want to know why the match starts one base earlier in sequence two. The preceding gap is one base shorter in that sequence. You can view the aligned sequence data by calling mga with parameter -clustalw. This outputs the ClustalW alignment of short gaps:

72:151-222 71:151-221 72:151-222 
Seq 1: tcactaacagatattctatagaaggaaaagttatccacttatgcacatttatagttttca        60
Seq 2: ...............................atg.....agca..-g.............
Seq 3: a............c.................................c.....c....t.

Seq 1: gaattgtggata                                                        72
Seq 2: ..........ct
Seq 3: ............

Next: Pairwise Sequence Alignment Up: Examples Previous: Examples

Jan Krueger 2012-12-14