|
|
|
|
|
|
|
EMBL entries (as below) are structured in a way to be usable by human readers as well as by computer programs. Each entry in the database is composed of lines. Different types of lines, each with its own format, which are used to record the various types of data which make up the entry. Some entries will not contain all of the line types, and some line types occur many times in a single entry.
Each entry begins with an identification line (ID) and ends with a terminator line (//). Consult the EMBL user manual for a more comprehensive guide.
|
|
|
- ID (IDentification line): always the first line of an entry. The general form of the ID line is:
ID, entryname, dataclass, molecule, division, sequencelength
- XX: contains no data or comments. It is used instead of blank lines to avoid confusion with the sequence data lines.
- AC (Accession Number): lists the accession numbers associated with this entry.
- SV (Sequence Version): contains the new format of the nucleotide sequence identifier.
- DT (DaTe): shows when an entry first appeared in the the database and when it was last updated.
- DE (DEscription): contain general descriptive information about the sequence stored.
- KW (KeyWord): provides information which can be used to generate cross-reference indexes of the sequence entries based on functional, structural, or other categories deemed important. The keywords chosen for each entry serve as a subject reference for the sequence, and will be expanded as work with the database continues. Often several KW lines are necessary for a single entry.
- OS (Organism Species): specifies the preferred scientific name of the organism which was the source of the stored sequence.
- OC (Organism Classification): contain the taxonomic classification of the source organism.
- RN (Reference Number): gives a unique number to each reference citation within an entry.
- RC (Reference Comment): optional line type which appears if the reference has a comment.
- RP (Reference Position): optional line type which appears if one or more contiguous base spans of the presented sequence can be attributed to the reference in question.
- RX (Reference Cross-reference): optional line type which contains a cross-reference to an external citation or abstract database.
- RA (Reference Author): lists the authors of the paper (or other work) cited.
- RT (Reference Title): give the title of the paper (or other work).
- RL (Reference Location): contains the conventional citation information for the reference.
- DR (Database Cross-Reference): cross-references other databases which contain information related to the entry in which the DR line appears.
- CC: free text comments about the entry, and may be used to convey any sort of information thought to be useful.
- FH (Feature Header): present only to improve readability of an entry when it is printed or displayed on a terminal screen. The lines contain no data and may be ignored by computer programs.
- FT (Feature Table): provide a mechanism for the annotation of the sequence data. Regions or sites in the sequence which are of interest are listed in the table.
A complete and definitive description of the feature table is given here.
- SQ (SeQuence header): marks the beginning of the sequence data and gives a summary of its content.
- The sequence data lines has lines of code starting with two blanks. The sequence is written 60 bases per line, in groups of 10 bases separated by a blank character, beginning in position 6 of the line. The direction listed is always 5' to 3'
- The // (terminator) line also contains no data or comments. It designates the end of an entry.
|
XX
AC U22421;
XX
SV U22421.1
XX
DT 13-APR-1995 (Rel. 43, Created)
DT 17-APR-2005 (Rel. 83, Last updated, Version 3)
XX
DE Mus musculus obesity protein (ob) gene, complete cds.
XX
KW .
XX
OS Mus musculus (house mouse)
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muridae;
OC Murinae; Mus.
XX
RN [1]
RP 1-2235
RA Chehab F.F., Lim M.E.;
RT "Genomic organization and sequence of the mouse obesity gene";
RL Unpublished.
XX
RN [2]
RP 1-2235
RA Chehab F.F., Lim M.E.;
RT ;
RL Submitted (09-MAR-1995) to the EMBL/GenBank/DDBJ databases.
RL Farid F. Chehab, Laboratory Medicine, University of California, San
RL Francisco, 505 Parnassus Avenue, San Francisco, CA 94143-0134, USA
XX
DR MGI; 104663; Lep.
XX
FH Key Location/Qualifiers
FH
FT source 1..2235
FT /chromosome="6"
FT /db_xref="taxon:10090"
FT /mol_type="genomic DNA"
FT /organism="Mus musculus"
FT /strain="C57BL/6J"
FT CDS join(1..144,1876..2235)
FT /codon_start=1
FT /db_xref="GOA:P41160"
FT /db_xref="HSSP:1AX8"
FT /db_xref="InterPro:IPR000065"
FT /db_xref="InterPro:IPR009079"
FT /db_xref="UniProt/Swiss-Prot:P41160"
FT /gene="ob"
FT /product="obesity protein"
FT /protein_id="AAA64213.1"
FT /translation="MCWRPLCRFLWLWSYLSYVQAVPIQKVQDDTKTLIKTIVTRINDI
FT SHTQSVSAKQRVTGLDFIPGLHPILSLSKMDQTLAVYQQVLTSLPSQNVLQIANDLENL
FT RDLLHLLAFSKSCSLPQTSGLQKPESLDGVLEASLYSTEVVALSRLQGSLQDILQQLDV
FT SPEC"
FT intron 145..1875
FT repeat_region 449..585
FT misc_feature 1876..1879
FT /note="slippage of acceptor site results in inclusion or
FT exclusion of glutamine at amino acid position 49"
FT /gene="ob"
XX
SQ Sequence 2235 BP; 568 A; 571 C; 547 G; 549 T; 0 other;
atgtgctgga gacccctgtg tcggttcctg tggctttggt cctatctgtc ttatgttcaa 60
gcagtgccta tccagaaagt ccaggatgac accaaaaccc tcatcaagac cattgtcacc 120
aggatcaatg acatttcaca cacggtagga gtcttatggg gggacaaaga tgtaggacta 180
gaaccagagt ctgagaaaca tgtcatgcac ctcctagaag ctgagagttt ataagcctcg 240
agtgtacatt atctctggtc atggctcttg tcactgctgc ctgctgaaat acagggctga 300
gtggttccat ttctaaaccc agcatctaga ctgctcagct gtactgccag tatcgcatga 360
ttctaatcct aagccacctt agggaattta acttctctct tatactccca ttaagaaacc 420
ataaggtgtc gggcgtggtg gcacatgccc tctaatccca gaactcggga ggcagaggca 480
ggtggatttc tgagttcaag gccagcctgg tctacaaaat gagttccagg acagccaggg 540
ctatacagag aaaccctgtc tcgaaaaacc aaaaaagaag ccataaggtt ctttgatatc 600
ataaggccat gctcattttc cctctgccac aggaaaccca gcccttggtg gctagctgag 660
catgtaaggt acacatcaga cctgggagaa cctgggttcc tccctgcttc cacagaccac 720
cctctcccct tccttagccc cctgtttctg cctctctcat tctctttcat ccatgaaact 780
acttccttga atttagtacc cagggcataa gaatccctaa aggtcatggt gtcccattga 840
cacgtggaca gcttcccagc agtgtctcta ctgggcagga ggagcagtag gtttctaatg 900
gtttttagct acagcttctg cccaccgctc acccactttt caaagtcaca cagaaaacaa 960
cctttccctc ctttacaacc agtccttgtg tagctgctga tagtggtcgg tgcccaccat 1020
gttcttcctc cgaggcccag cagcctacat cttcagccat ttcctcagat gtatctaagc 1080
tatgtgcata tcaccatatc tgcttcccat ctgcaagatc ttaggccagt tctccggtgg 1140
gttttaaacc ttgattttac catcttgatg agggagacat catatcatat caccaagttg 1200
ttctaaggct taaatggggt gtagtgaaag actttcttgt ggagccatct ggagactact 1260
atgtctcctg accagtgtgc gtgtctcaca gtgtggcctt ggcagctagg agaagtcaga 1320
tattcagaat caagggacag cttaatataa gagacttatg cggagaaagt tctcatcatc 1380
tctcgacaag agtcatcagg gctgcacatg gagaggccca actacccaaa tgtgggtgga 1440
aatgagagga agccagtggg gaaagccctt cctggtaacc agactcagca gagtgggggg 1500
ggggggcacg gctttgaccc taatgaggga gaaccacaga agagtatgac taggagggag 1560
agatctgata agggcaggag gctagagaga atataaggaa taaagagcta tggctggttc 1620
ttcacggata tcattggaga aaggaattac tcaagactaa tcagaagtga agggtggagt 1680
gactcggaat gatcagaaag tccgggagac cagctccgtg gcttccagtc agctgatgac 1740
aggaagtaag gacctggacc aggaaggtga gaaggaagga ggtagcccag gttcacagat 1800
gtaatgtaga gctctggagc ccgatgctcc ctgccacttg ctaaaacacc tcttgttctt 1860
cttcctcctc catatcagtc ggtatccgcc aagcagaggg tcactggctt ggacttcatt 1920
cctgggcttc accccattct gagtttgtcc aagatggacc agactctggc agtctatcaa 1980
caggtcctca ccagcctgcc ttcccaaaat gtgctgcaga tagccaatga cctggagaat 2040
ctccgagacc tcctccatct gctggccttc tccaagagct gctccctgcc tcagaccagt 2100
ggcctgcaga agccagagag cctggatggc gtcctggaag cctcactcta ctccacagag 2160
gtggtggctt tgagcaggct gcagggctct ctgcaggaca ttcttcaaca gttggatgtt 2220
agccctgaat gctga 2235
//
|
|
|