BiBiServ2 - ConCysFind

ConCysFind is a pipeline tool searching conserved amino acids in protein sequences of plant kingdom

ConCysFind was developed on behalf of the department "Plant Biochemistry and Physiology" at the University of Bielefeld. The development was supported by the department "Computational Metagenomics". A pipeline developed by A. Sahm served as a template, which searches for conserved cysteines of transcription factors of the Plant Transcription Factor Database (PlantTFDB) and could previously predict conservation of several transcription factors. In previous versions, only conserved cysteines were considered. With ConCysFind searches for conservation of other post translationally modified amino acids is possible: tryptophan, serine, threonine, tyrosine and methionine. The search of conserved amino acids is limited to the plant kingdom. For this purpose, 21 plant species that represent high evolutionary diversity and are evenly spread among the different plant taxa with consideration of one proxy species per species were chosen. A phylogenetic tree of these species, based on the Tree of Life Web Project (Maddison /et al/., 2007) can be viewed here: phylogenetic tree created with the Tree of Life Web project All protein sequences of these plant species were downloaded from UniProt in FASTA format. This tool is working with this database.

Working steps:

Finding Homologues to the query sequence by inclosed tool blastp, then filtering the best homologues from each species to represent the conservation between all species
Building a Multiple Alignment with heuristic progressive methods (Greedy algorithm) of all filtered homologues (one homologue per species)
Based on the computed Multiple Alignment calculating a degree of conservation (Cysteine Score and P-Value)
Additionally building a phylogenetic Guide Tree computing by Neighbor Joining algorithm (Saitou and Nei 1987) based on Multiple Alignment

ConCysFind has two functions:

The first input function allows for a .tsv-file input (tab separated view) with multiple sequences in UniProt ID format in first column and protein description in second column as well as pasting several queries sequence with description, one query per line. For each query a multiple alignment will be generated and for each detected amino acid an amino acid score and P-Value calculated. Lastly, a phylogenetic tree representing the conservation of the query protein and the searched for amino acid will be generated.
The second input function allows to enter a single amino acid sequence (in single letter amino acid code). You can also upload a file containing only one sequence. Interesting: In the parameters setting conservation of all found amino acid can be selected (by setting searched position 0) or one particular position (by giving the amino acid position). In case of given particular position the program iterates the working steps 2-4 with further found homologues (max. 9 iterations). Accounting for amino acids conserved in some homologues but lost during evolution in other homologues, e.g., following gene duplication and neofunctionalization.