next up previous
Next: A Word of Caution Up: Efficient Multiple Genome Alignment Previous: About This Document

Introduction

mga is a software tool for efficiently aligning two or more sufficiently similar genomic sized sequences [HKO02]. It belongs to the category of anchor-based multiple alignment methods. mga uses multiMEMs (or MEMs, MUMs as special cases) to anchor the alignment. In essence, they are (hopefully long) stretches or regions of identical bases occuring in all input sequences.

The first step in producing an alignment is to find multiMEMs (also referred to simply as matches). In order to ensure a consistent alignment, these matches have to be chained. Without going into detail, this is done by an algorithm that gives the optimal chain of matches out of all matches found, according to some criterion. The chained (and therefore selected) matches are considered aligned.

In regions between these matches (and possibly at the start and end of the sequences) are gaps. If the gaps are ``long'', mga can recursively repeat the first step, searching for matches of shorter but significant length. This directly results in a higher number of aligned matches. As a side effect, more and shorter gaps are produced, allowing the following process to be employed more often.

In the second step, if the remaining gaps are ``short'', a conventional alignment method is used to align these non-identical regions. The meaning of ``long'' and ``short'' is sequence dependent, of course. Still, at the end, some unaligned long gaps may remain. This makes sense biologically: since there are no matches exceeding the length threshold, there is no detectable similarity and the gaps are not forced into an alignment. This way, mga can cope with long insertions, deletions, etc., retaining its overall efficiency.


next up previous
Next: A Word of Caution Up: Efficient Multiple Genome Alignment Previous: About This Document
Jan Krueger 2012-12-14