BiBiServ2 - ConCysFind

This manual describes all parameters, input and output data supported by this ConCysFind online service. The variation and count of parameters, the parameter range and input/ output types can differ from the standalone used program.

TSV input function

This function allows you to give in a tsv input, tab separated view with UniProt ID in first column and protein description in second column. You can put several queries in this input, one query per line. You can also upload an input tsv file. For every query there will be processed a multiple alignment, and for every found amino acid in query sequence there will be calculated an amino acid score and P-Value and also the phylogenetic tree.

Sequence input function

Second function allows you to give in a single amino acid sequence (without a UniProt ID). You can also upload an file containing only one sequence.
Interesting: In next view by setting parameters you can choose whether you want to prove amino acid conservation for all found positions (by setting searched position 0) or you want to prove only for one particular position. In case of given particular position the programm iterate the working steps 2-4 with further found homologues (max. 9 iterations). The opinion is to consider possible isoforms, mutations or mistakes happend by sequencing the protein sequence, and to find possibly higher amino acid score by looking at next similar homologues.

In-/Output values

INPUT :: Input tsv file

This input is a tab separated view with UniProt ID in first column and protein description in second column. tsv format is required here to parse UniProt ID's correctly. You can put several queries in this input, one query per line. You can also upload an input tsv file. A description text longer than 100 chars will be abbreviated.

INPUT :: Input amino acid sequence

Input single amino acid sequence should only contain amino acids and no other characters. Boundary of length is 20<=INT<=10,000 You can also upload an input file with a single sequence.

OUTPUT :: Logfile

Logs generated during program execution.

Parameter

Name

Description

Protein sequence description

Description of your single input protein sequence. Setting is optional. A description text longer than 100 chars will be abbreviated.

Searched position

Here you can prove the conservation for a particular position of your single query sequence. Be sure that at your set position is located the searched amino acid for which you want to calculate the conservation. The pipeline is iterating maximal 9 times to get an optimized amino acid score for the amino acid on set position. Set 0 if programm should calculate conservation for all found given amino acids. Default is 0. Maximum is 10,000 because query length higher than 10,000 is not allowed.

Maximal count of BLAST results

Maximal count of complete BLAST results which will be looked for filtering. Boundary is 500<= VALUE <=100000. Default is 20000.

BLAST E-Value threshold

E-Value threshold for BLAST results. Boundary is 0.0< VALUE <=10.0. Default is 0.0000000001 or rather 1E-10

Gap open penalty

Gap open penalty for multiple alignment. Boundary is 1< VALUE <=100. Default is 12.

Gap extension penalty

Gap extension penalty for multiple alignment. Boundary is 1< VALUE <=100. Default is 1.

Amino acid score threshold

Threshold for amino acid score -> the amino acid in query is probably conserved when the amino acid score is higher than the given threshold. Boundary is 0.1<= VALUE <=0.9. Default is 0.5.

P-Value threshold

Threshold for P-Value -> the amino acid in query is probably conserved when the amino acid's corresponding P-Value is lower than the given threshold. Boundary is 0.1<= VALUE <=0.9. Default is 0.5.

Searched amino acid

Searched amino acid for calculating the conservation. Available amino acids are cysteine, tryptophan, serine, threonine, tyrosine and methionine. The search is limited to these because they are more of a functional role in the organism cell. Default is cysteine.

Substitution BLOSUM matrix

Substitution matrix (BLOSUM matrix) for multiple alignment. You can choose from blosum30 to blosum100. Lower matrix is used for sequences where lower similarity between them is expected, higher matrix is for higher similarity expected. Default matrix is blosum62.

Additional Output

The result of ConCysFind is offered as a zipped archive for download. The archive contains following files:

RESULT_ALIGNMENTS.txt with created multiple alignment(s)
RESULT_SCORES.xls containing a sheet table with query information and especially scores and P-Values for evaluating the result
Phylogenetic Trees folder - for every found amino acid position there is constructed a tree. Leafs describe the species of a homologue and if the amino acid is conserved in this homologue sequence, the leaf will be marked green, else it will be marked red. Query leaf is marked blue.