|
|||||||||||||||||||||||||||||||||||
RNAforester2 - ManualRNAforester calculates the similarity between a pair or multiple RNA secondary structures. Note that the scoring scheme for pairwise and multiple alignments differs slightly (see below). InputThe input sequences/structures have to be given in Vienna (Dot-Bracket) format. The first line of a sequence/structure block starts with an '>' character followed by an id (first word) and an optional description. The next line contains the sequence information and the last line of a block contains the structure information, where matching brackets symbolize base-pairs and unpaired bases are represented by dots. An example is given below:>id1 description accaguuacccauucgggaaccggu .((..(((...)))..((..)))). >id2 description ... Global, local and small-in-large alignmentLocal similarity means finding the maximal similarity between substructures of RNA secondary structures. If these substructures are extended, the score decreases. This requires a scoring scheme that balances positive and negative scoring contributions. Otherwise, the similarity of the complete structures would always achieve the maximum score. It is generally assumed that an alignment of two empty structures scores zero. Local distance makes no sense, as empty forests have always the lowest possible distance of zero. Substructures of RNA secondary structures in RNAforester are contiguous and ``closed'' by hairpin loops. The blue region shows a valid substructure. The green part of the structure is not closed because the closing hairpin is missing. The red part is a substructure that is not considered as a local structure for the same reason. This is less obvious, since only the U, which is a child of the root of this subtree, is not included. If the top-level P node would not be included in the red substructure, this part would correspond to a closed subforest. The yellow part does not correspond to a closed subforest since the subtrees are not consecutive siblings. Scoring modelsPairwise and multiple alignment
The sequence edit operations base match, base mismatch and base deletion are the same for pairwise and multiple alignment. A base pair breaking means the deletion of a base-pair bond. A base-pair deletion is the composition of a base-pair breaking and two base deletions. A base-pair altering is treated likewise but there is only one base-deletion involved. The structural edit operations base pair replacement and have a different effect for pairwise and multiple alignment. In pairwise alignment mode, the pairing bases are treated as a unit. In multiple alignment mode, base pair replacement score means the score for matching any base-pair plus the score for matching or mismatching the bases that pair. Thus, it is not possible to construct a base-pair dependend scoring for this model. The RIBOSUM scoring scheme are empirically derived base-pair and single base substitution scores that are available in pairwise alignment mode. Linear and affine gap score modelThe default forest alignment is based on a linear gap score model. Each gap contributes a fixed score, given by parameter -pd for pair indels and -bd for gap indels. A continuous gap receives a score that is linear in the gap length. With option -a, the affine scoring scheme is invoked. In this scoring scheme, a continuous gap receives an affine score with a higher gap opening cost and a lower gap extension cost. The gap opening cost is given by parameter -pdo for opening a pair indel, and -bdo for opening a base indel. The gap extension cost is given by parameter -pd for pair indel extension and by parameter -bpo for base indel extension.Speedup by anchored alignmentThe option --anchor invokes the anchored alignment mode of RNAforester. It has to be used together with option -t (top down computation). The anchored alignment mode is a heuristic alignment algorithm based on a given set of anchors for the input structures that should be aligned. RNAforester is able to derive anchoring information via abstract shape analysis from the input structures. The input structures have to have the same shape (such as the output blocks of RNAcast for each shape).Output |
|
||||||||||||||||||||||||||||||||||
|