KnotInFrame - Supplementary data

This page contains additional details on the analysis of the PRFdb as described in our paper.

Top prediction with KnotInFrame:

KnotInFrame's 100 best predictions of the complete yeast genome: Results


Comparison with the PRFDB

The PRFdb contains computer generated frameshift candidates of which 1679 were classified as strong candidates in Jacobs et al. NAR (2006)
We found that their and our method have a surprisingly small overlap in positive predicted frameshift signals.
A first look at the structures in the PRFdb revealed that many structures, in fact the majority, do not contain a pseudoknot at all and thus do not resemble the widely accepted frameshift pseudoknot consensus.
In order to create a subset of the PRFdb, in which structures closely resemble the pseudoknot consensus, we filtered the data base for pseudoknotted structures, which results in 163 structures. This list still contains lots of structures that violate the consensus frameshift signal. Here, we list a few examples of how the consensus is violated. These structures were filtered out for our further analysis. In the end there are only 74 structures that could in principle be predicted by either of the methods.

  1. Spacer too long (should be smaller than 12 nucleotides; here 28 nucs)
    >SGDID:S0006076 | pos: 282
    -11-----1122222-------22222-3333333-----44444-3-333333-----44444---
    ACATATACTGGGTCAATCACTGTGACCATCCGGCCGAAACCACGGAGCGTTGGAACTTCCCGTGACC
    .((.....))(((((.......))))).(((((((.....[[[[[.).)))))).....]]]]]...
    
  2. Loop 1 > 10 nucleotides, Stem 2 < 4 nucleotides
    >SGDID:S0004513 | pos: 972
    -----111111--11111222222222-----222222222--33--11111111111---444----44433-----------
    TGTCATAGGCTTCGAAAATCATATTCTGGTAAAGGATATGAACGGATTTTTCAGCCTGAAAGTGGAAATACCCAAAAGATCTAT
    .....((((((..((((((((((((((.....)))))))))..[[..)))))))))))...(((....)))]]...........
    
  3. Example where Loop 1 is too long and harbors an additional structure:
    >SGDID:S0002824 | pos: 810
    ---1112222222-----2222222--3333-44--------44555555-------555555111---3333666-----666--
    TTGGCCAGCCTAGAAAAACTAGGTTTAGATAATCAATATGAGGAGTTCATGAGGCAAATGAATGGCATATATCCCGACAAATGGCT
    ...[[[(((((((.....)))))))..((((.((........))((((((.......))))))]]]...))))(((.....)))..
    
    >SGDID:S0002814 | pos: 1080
    --1111--2222222--222----2222-----2222-222-222222233--444441111---44444---33-
    CCAGGCTGACATTGGAAAGACCGCGGCTACTGTGGCCATCTATCAATGTTCTCAGGATGCTTATGATCTTTTTGAT
    ..[[[[..(((((((..(((....((((.....)))).))).)))))))((..(((((]]]]...)))))...)).
    
  4. Complex pseudoknot (have a look at the 2d-plot)
    >SGDID:S0000212 | pos: 228
    111111111---222222-3333---444111111111---444---555333366---666-----666--66-555-222222--
    TTTCAGGGTGGATTGGAACGGCCCCAGTGATCCTGAGAACCCACAAAACTGGCCCCTACTGAAAAAATCATTGGTAGTATTCCAAAT
    [[[[[[[[[...((((((.((((...(((]]]]]]]]]...)))...[[[))))((...(((.....)))..)).]]].))))))..
    
Note that in our analysis we do not make a statement on frameshifts not caused by a pseudoknot. If we rule out a particular location in our analysis, then only on the assumption that it is not a pseudoknot which causes the frameshift. A possible frameshift by, say, a hairpin structure remains unnoticed. <