about sitemap home home
Databases Data Formats Database Search Genome Browser RNA Secondary Structure Alignments Primer Design WebServices
FASTA Genbank EMBL XML
Exercise GenBank
Bielefeld University Center of Biotechnoloy Institute of Bioinformatics BiBiServ
 
GenBank Data Format - Exercise 1
grep, agrep, cut, sort and wc are very useful UNIX tools to get a quick overview of text files or extract certain information from files. Use these tools to answer the following questions:
  1. Browse to the Taxonomy Database at NCBI and download a GenBank formatted file of all nucleotide sequences from Rat-kangaroos (Potoroidae).
  2. How many sequences are available? Did the download succeed? (Sometimes huge batch downloads break, so it is always a good idea to check if all sequences were retrieved!)
  3. How many different geni and species are represented in the file?
  4. Write a small shell script that counts the number of sequences for each genus available in a GenBank file. Apply your script to the file above and count the geni.
  5. How many coding sequences and corresponding GenBank entries are in the file you downloaded? Extract the protein IDs to a file and use NCBI's Batch Entrez to download the protein sequences corresponding to the entries in your file.