Families of structurally related RNA molecules are modelled with stochastic context free grammars
(SCFGs). These are a generalization of HMMs (which we use for sequence families. SCFGs reflecting the fact
that RNA structures are not sequential, but have a branching, treelike shape.
A SCFG is a conventional context free grammar (as widely used in computer science for the definition of
programming languages), essentially a set of rules which can be applied to derive sequences. These rules
reflect the properties of RNA  e.g. two bases that form a base pair must be derived within the same rule.
The stochastic aspect comes from probabilities associated with the rules of the grammar. They are learned
from data using the program CMbuild from the Infernal package.
For searching a data base of family models with a given sequence, the sequence is parsed using the SCGF
with the probability parameters trained from the family members. This is implemented by the program CMfind
from the Infernal package.
The Rfam data base is currently the most important resource for RNA family models. It is maintained at
the Sanger Centre, using Infernal and lots of available structural data to build the family models.
