|
 |

|
|
Multiple sequences usually carry more information than each sequence alone. For RNA, compensating base mutations are a strong evidence for putative base pairs: If a putative G-C base pair mutates to an A-U base pair these positions are likely to pair.
There are basically four possible ways how to use multiple sequences for structure prediction:
- Plan-A: First align multiple sequences, then fold the multiple sequence alignment into one secondary-structure.
- Pros
- uses established multiple sequence alignment methods.
- fast.
- Cons
- cannot garantee to find the global optimum.
- Plan-B: Simultanous align multiple sequences and find their optimal consensus secondary-structure.
- Pros
- Cons
- NP hard problem, thus very slow.
- Plan-C: First predict secondary-structure for each sequence and find one that is common to all sequences. Then structurally align the secondary-structure and derive a multiple structure/sequence alignment.
- Pros
- focuses on structures, not sequences as Plan A does.
- Cons
- cannot garantee to find the global optimum.
- Plan-D: Use covariance or mutual information.
- Pros
- stochastically sound method.
- Cons
- needs many examples for training phase.
- cannot garantee to find the global optimum.
|
 |
|
Many genes are controlled by small regulatory motifs in their 5' and 3' UTR. For some of these motifs, not the sequence but the secondary structure is important for proper function. It is often difficult to find these small motifs using sequence based methods, due to their low sequence conservation. A more appropriate way is the calculation of a local RNA structure alignment, i.e. to search for the most similar substructures inside larger RNA secondary structures.
|
 |
|
|
|