next up previous
Next: Examples Up: Efficient Multiple Genome Alignment Previous: Output

Format of Textual Output

In the following, the textual output of mga is presented prototypically for all options that influence it. Variables are enclosed by `<' and `>'. The input consists of $k
\geq 2$ sequences. All relative positions start at base zero. This is the default if option -absolute is not given.

Matches are output in the following format:

<length> <position 1> ... <position k>

Option -match displays the complete sequence data once, as it is the same in all sequences.

Exact: <sequence data>                                  <length>

Option -bl displays the first and last bases of the sequence data once.

Exact: <sequence data>                                  <length>
       ...skipping bases...
Exact: <sequence data>                                  <length>

Every gap consists of k parts: one for each of the k sequences. Depending on its length a particular part of an aligned gap is displayed in one of three different formats. There is one format for the general case and two additional formats for special cases. The general case occurs when a part is at least two bases long. Then, the length and the start position of that part is shown. For better readability, the end position is given as well.

... <length i>:<start position i>-<end position i> ...

The first special case occurs when a particular part consists of exactly one base. This part is considered a putative SNP. It is output using a special format to simplify extracting it.

... !<SNP position i> ...

The second special case occurs when a particular part has no length at all. This is possible and means that the other (non-empty) parts are insertions: there is no corresponding sequence data in this sequence. This is indicated by a dash.

... - ...

An unaligned gap is always indicated as follows. Note that a short explanation in the actual output was cut off in order to fit the line on the page. It is displayed in the corresponding example in the next section.

Gap <gap number>: <min. length> + <difference> = <max. length>

The sequence data of aligned gaps from two sequences as computed by options -greedy and -xdrop can contain dashes `-' to indicate gap characters. Mismatches are indicated by exclamation marks `!'. The lengths 1 and 2 are identical as they refer to the (same) length of the line.

Sbjct: <sequence data 1 with gaps>                    <length 1>
       <mismatches>
Query: <sequence data 2 with gaps>                    <length 2>

The sequence data of aligned gaps as computed by the option -clustalw can contain dashes `-' to indicate gap characters. The first sequence is always shown completely. The other sequences are displayed in a different way: To enhance readability, matching bases w.r.t. the first sequence are marked as dots `.'. The remaining sequence data is shown as usual.

Seq <1>: <sequence data 1 with gaps>                    <length>
...
Seq <k>: <mismatches of sequence data k with gaps>


next up previous
Next: Examples Up: Efficient Multiple Genome Alignment Previous: Output
Jan Krueger 2012-12-14