Startup possibilities
UniMoG can be started in console modus or with a graphical
user interface.
For using the console modus two parameters have to be
provided:
- The scenario: 1 = DCJ, 2 = rDCJ, 3 = HP, 4 = Inversion, 5 =
Translocatin, 6 = All scenarios at once
- The path to a single file containing all genomes which
should be analyzed
Example: java -jar UniMoG 5 path\to\genomefile.txt
path\to\genomefile2.txt
If no correct parameters are povided, UniMoG starts with the
graphical user interface.
Genome Representation
The loaded files have to contain at least two, but can contain
several genomes.
Every genome starts with a > followed by the genome
name.
In the following line(s) come the genes. 'Newlines' are
ignored.
Each chromosome is concluded by a closing parenthesis ')' or a
vertical bar '|'. A ) concludes a circular chromosome, a |
concludes a linear chromosome and can be omitted for the last
chromosome of a genome, leaving it linear. The gene names can
contain all signs except whitespaces. A - before a gene name
means that the gene direction is reverse.
Each pairwise comparison takes into account only genes that occur
in both genomes of the pair.
Every line that starts with '//' has no function and is treated
as a comment. Everything before the first genome is a comment and
leading whitespaces of a genome name are ignored.
As an input example consider the three genomes:
>Genome A
2 -1 3 |
5 -4 6 ) | |
---|
> Genome B
2 1 3 |
6 5 4 | | |
---|
>Genome C
2 3 1 |
5 6 4 7 8 | | |
---|
A simple example file can be found here.
A more complex example file that shows the possibilities can be
found here.
Optimal DCJ, rDCJ, HP, Inversion and Translocation Distance
and Sorting
The program can compute five different genomic distances and
optimal sorting scenarios: The DCJ, restricted DCJ, HP
(Hannenhalli & Pevzner), Inversion and Translocation distance
and an optimal corresponding sorting. All five distance and
sorting algorithms were implemented under the aspect of
efficiency and in most cases feature the most efficient
algorithms known until today. Links to the corresponding papers
can be found below.
Output example:
The output consists of the three tabs 'Genomes', 'Adjacencies'
and 'Graphics'.
The 'Genomes' tab either starts with informative messages, if a
genome comparison fails or singleton genes have to be removed
from a genome, or directly starts with the distance results as a
distance matrix, if the input consists of more than two genomes.
Otherwise the distances are returned in their context:
or if only Genome B and Genome C are compared with the HP
model:
A '-' in the distance results means that the distances for
this pair could not be calculated. Certain rules for the
different distance models, explained below under 'Rules', have to
be obeyed.
The graphical output will look like this for each pairwise
comparison:
The fragments emerging from the two cuts (red vertical ticks
in the chromosome) of each step are highlighted by colored lines,
each with a different color. The lines for the current step are
located below the chromosome, while the lines depicting the
fragments of the previous step are located above the chromosome
of the current step.
Detailed explanations of the DCJ model and how it can be used
for calculating the rDCJ, HP, inversion and translocation
distances are given in the following papers:
- The DCJ model is explicated in [13] and improved and simplified in
[3].
- The restricted DCJ model is introduced in [9].
- How DCJ serves as a basis for HP, inversion and
translocation distances is explained in [4], while [5] provides the correct distance
formula. The running time of all distance computations is thus
linear.
The other models and the sorting algorithms are described
here:
- The HP model was first introduced in [6] and further studied in [10]. The sorting algorithm used in
this implementation is composed of: [12] as a basis for the
preprocessing including the correction provided by [8]. As suggested in [12] the sorting itself is then
carried out by the inversion sorting algorithm referred to in
the next item. Therefore, the sorting process runs in quadratic
time, which is determined by the used inversion sorting
algorithm.
- The inversion model was first solved in polynomial time in
[7] and the most efficient,
subquadratic sorting algorithm is described in [11]. Here, we implement the second
most efficient, quadratic time algorithm, also described in
[11], with the aid of the
data structures from [1].
- The translocation model and the implemented cubic sorting
algorithm are described in [2]. This algorithm was chosen
because in practice its running time is almost always
linear.
Rules
For a successful genome comparison the following rules have to
be obeyed:
- The DCJ distance calculation can handle all kinds of
genomes without restrictions
- The restricted DCJ model works for linear chromosomes and
directly reincoporates all emerging circular chromosomes in the
next step (modelling block interchanges and
transpositions).
- The HP distance calculation requires linear
chromosomes
- The translocation distance requires linear chromosomes with
the same telomeres.
- The inversion distance requires one linear chromosome with
the same telomeres in both genomes.
Program Usage - Step by Step
- First click on and choose at least one file that contains at
least two genomes. Or alternatively click to work with a
small example file.
- Then choose the genomes to compare in the lists below and
select one of the five possible scenarios. The 'All'
scenario means that all four scenarios are calculated at once,
if applicable.
- By pressing the Button the genomes will be compared. The results are
shown in the tabs in the lower part of the window.
-
- The first tab first shows the distance in a matrix for more
than two genomes or otherwise in context with the chosen
scenario and afterwards the stepwise transformation for all
pairs of genomes.
- The second tab shows the adjacencies (junctions) between
the genes. The suffix '_h' represents the head of a gene
and '_t' the tail of a gene. Chromosomes are separated by
'{content1} , {content2}' including the whitespaces for an
easier localization of a chromosome end.
- The third tab contains a graphical representation of the
transformation of all pairs of genomes with highlighted
operations.
- Before running the comparison:
-
- You can decide if only the distances are of interest
and the sorting scenarios are not needed (makes the
calculation also much faster). In this case deselect the
checkbox, otherwise keep it selected.
- You can choose a coloring method for the graphic:
If you select each chromosome has a distinct
color, otherwise they have a white background.
- You can also adjust the size of the output using the size
slider with three different sizes.
- A graphic result can be saved as jpg image by clicking on
.
- The whole textual results can be saved in a txt file by
clicking on .
- All values and results can be cleared by the button.
- The button
terminates the program.
- If you want to look at the help directly from the program,
click on .