In this section you will learn about one
of the most important things in molecular biology: the comparison of
data sequenced in the lab (nucleotide or protein) with all known
sequences collected in a certain database. This procedure is often
referred to as homology search. The search results, sequences that
are similar to our sequence, might give an indication of the
function of our new sequenced gene.
NCBI's non-redundant (NR) protein database
contains 2.5 million sequences with almost 850 million amino acids
(June 2005). This precludes the direct approach of aligning the
query sequence with each sequence in the database. Instead,
efficient filtering or indexing methods are used to cut down the
running time. These methods do not necessarily guarantee to find the
best match, but nevertheless they are invaluable tools in a
molecular biologist's daily life.
|