Next: The Program Up: Efficient Multiple Genome Alignment Previous: A Word of Caution

Preparing the Input Sequences

mga efficiently locates anchors, i.e., matches in the input sequences. To do so, these sequences need to be preprocessed. This preprocessing step has to be done only once, as the created index is stored on secondary memory (usually a hard disk) and resides there until deleted by the user. The tool to index the sequences is . If you have three or more sequences you must proceed as described in a). If you have two sequences you can choose from two alternatives. Either you proceed as in a) or as in b). The implications are: a) the index is selfcontained, faster to use once it is built, and you are able to use the recursive strategy. b) the index uses minimal space (it contains only one sequence), so you have to supply the second sequence with options -mem or -mum. Because of that you can supply a different sequence every time you call mga and you can compute MUMs, i.e., unique matches.

Here are the two ways to index the sequence(s) (assuming appropriately named sequences are to be aligned):

a)

mkvtree -dna -lcp -suf -tis -indexname 2ormoreseqs\
 -db seq1.fna seq2.fna [seq3.fna ...]

You can use the index by calling

mga [options] 2ormoreseqs

b)

mkvtree -dna -suf -tis -sti1 -pl 8 -db seq1.fna

You can use the index by calling

mga [options] -mem|mum seq2.fna seq1.fna

This is all you need to know to get started. For more information on consult [Kur02].

Next: The Program Up: Efficient Multiple Genome Alignment Previous: A Word of Caution

Jan Krueger 2012-12-14