Suppose you want to align three strains auf S. aureus. You follow instructions a) in Section 3 and build an index named 3staph:
mkvtree -dna -lcp -suf -tis -db NC_002745.fna NC_002758.fna MRSA.dbs\ -indexname 3staph
You are now able to run mga and create your alignments. You decide to turn on verbose output and to always recurse into gaps. Using these two options is always a good idea for multiple sequences, though in this example, the second one does not make a difference. The output will be written to files whose names start with example.
mga -v -l 1000 20 -always -o example 3staph
The file example.summary contains the command line parameters of mga. Also, the description of the sequences as present in their FASTA headers is given. Then, some simple statistics follow: more than 90% of the sequences are aligned.
The program call arguments were -v -l 1000 20 -always -o example 3staph Sequence description: Seq 1: gi|15925705|ref|NC_002745.1| Staphylococcus aureus subsp. aureus N315, \ complete genome Seq 2: gi|15922990|ref|NC_002758.1| Staphylococcus aureus strain Mu50, \ complete genome Seq 3: Staphylococcus aureus (EMRSA-16) chromosome 2,902,619 bp Number of matches / aligned / unaligned gaps is 23054 / 22960 / 93 length / all al. = matches + aligned / unaligned gaps Seq 1: 2813641 / 2654636 = 2348504 + 306132 / 159005 Seq 2: 2878040 / 2652323 = 2348504 + 303819 / 225717 Seq 3: 2902619 / 2653076 = 2348504 + 304572 / 249543 Avg. : 2864766 / 2653345 = 2348504 + 304841 / 211421 Coverage in percent Seq 1: 2813641 / 94.3 = 83.5 + 10.9 / 5.7 Seq 2: 2878040 / 92.2 = 81.6 + 10.6 / 7.8 Seq 3: 2902619 / 91.4 = 80.9 + 10.5 / 8.6 Avg. : 2864766 / 92.6 = 82.0 + 10.6 / 7.4 Please cite the following paper: Michael Hoehl, Stefan Kurtz, Enno Ohlebusch Efficient Multiple Genome Alignment Bioinformatics, Vol. 18 (S1):S312-S320, 2002
If mga is run as above the file example.align contains only the structure of the alignment, not the (aligned) sequence data itself. You can see a multiMEM in the first line, starting at position zero in all three sequences, then a gap which is five bases short, etc.:
61 0 0 0 5:61-65 5:61-65 5:61-65 85 66 66 66 72:151-222 71:151-221 72:151-222 131 223 222 223
When mga is run with the additional parameter -match it outputs the sequence data of matches. Recall that a match consists of bases that are identical in all sequences. Therefore, the bases are shown only once. The last line from the previous example is displayed as follows:
131 223 222 223 Exact: attagaaattacacacaaagttatactatttttagcaacatattcacaggtatttgacat 60 Exact: atagagaactgaaaaagtataattgtgtggataagtcgtccaactcatgattttataagg 120 Exact: atttatttatt 131
When your are only interested in, say, 20 bases next to a gap, you can get this context information by supplying mga with parameter -bl 20 instead:
131 223 222 223 Exact: attagaaattacacacaaag 20 ...skipping bases... Exact: tttataaggatttatttatt 20
You might want to know why the match starts one base earlier in sequence two. The preceding gap is one base shorter in that sequence. You can view the aligned sequence data by calling mga with parameter -clustalw. This outputs the ClustalW alignment of short gaps:
72:151-222 71:151-221 72:151-222 Seq 1: tcactaacagatattctatagaaggaaaagttatccacttatgcacatttatagttttca 60 Seq 2: ...............................atg.....agca..-g............. Seq 3: a............c.................................c.....c....t. Seq 1: gaattgtggata 72 Seq 2: ..........ct Seq 3: ............