|
|
|
|
|
|
|
The wide variety of data resources we have seen in the previous chapters have been developed
to support biological research. Unfortunately, different databases use different data formats
what makes computational approaches to interconnect the information difficult.
The Extensible Markup Language (XML) offers a
way to serve and describe data in a uniform and automatically parseable format.
XML is an easily and automatically parseable way to present data on the web. The basic representation
uses standard ASCII text and therefore provides an open source solution for data migration between different
programming languages, such as Java, PERL, C/C++, etc.
|
|
|
"XML documents are made up of storage units called entities ,
which contain either parsed or unparsed data. Parsed data is made up of characters, some
of which form character data , and some of which form markup . Markup encodes a description
of the document's storage layout and logical structure. XML provides a mechanism to impose
constraints on the storage layout and logical structure." [W3C] For a detailed specification see http://www.w3.org/TR/REC-xml.
|
|
|
Different databases and programs use different XML representations of their data. See Paul Gordon's XML for Molecular Biology web page for an overview of bioinformatic-specific XML definitions. |
<?xml version="1.0"?>
<!DOCTYPE TSeq PUBLIC "-//NCBI//NCBI TSeq/EN"
"http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">
<TSeq>
<TSeq_seqtype value="nucleotide"/>
<TSeq_gi>726296</TSeq_gi>
<TSeq_accver>U22421.1</TSeq_accver>
<TSeq_taxid>10090</TSeq_taxid>
<TSeq_orgname>Mus musculus</TSeq_orgname>
<TSeq_defline>Mus musculus obesity protein...</TSeq_defline>
<TSeq_length>2235</TSeq_length>
<TSeq_sequence>ATGTGCTGGAGACCCCTGTGTCGGTTC...</TSeq_sequence>
</TSeq>
|
|
|