|
grep, agrep, cut, sort and
wc are
very useful UNIX tools to
get a quick overview of text files or extract certain information
from files. Use these tools to answer the following questions:
- Browse to the Taxonomy Database at NCBI and download a GenBank formatted file of all nucleotide sequences from Rat-kangaroos (Potoroidae).
- How many sequences are available? Did the download
succeed? (Sometimes huge batch downloads break, so it is
always a good idea to check if all sequences were retrieved!)
- How many different geni and species are represented in the
file?
- Write a small shell script that counts the number of sequences
for each genus available in a GenBank file. Apply your script
to the file above and count the geni.
- How many coding sequences and corresponding GenBank entries are
in the file you downloaded? Extract the
protein IDs to a file and use NCBI's Batch Entrez
to download the protein sequences corresponding to the entries
in your file.
|
|