BiBiServ2 - mmfind

Usage

Type "mmfind" or "mmfind -h" to get information about the program and the options which modify the behaviour of the program:

mmfind [options] <multiple_fasta_file>

Evaluates an alignment in multiple FASTA format for mismatches. Quality scores are considered if a file with scores is supplied (the file should have the same name but a "qual"-extension; example: test.fa and test.qual). Scores should be in FASTA format and will be mapped on the aligned sequences. mmfind is a command- line tool written in Python. It is tested with Python 2.6, 2.7 and 3.1.

FILTERING

-a <integer>	(aligned:) minimal number of aligned sequences (default:2). The alignment will not be processed in case it contains less sequences (error code: -602).
-L <integer>	(length:) minimal alignment length (default:200, error code: -605).
-p <integer>	(polymorphism cutoff:) ignore alignments with percent mismatches per length exceeding given value (default:3, error code: -656).
-l <integer>	(length:) maximal length of mismatches to be reported (default:3).
-b <integer>	(border distance:) minimal distance of a mismatch to the alignment ends to be reported (default:80).
-s <integer>	(score:) minimal average score of the bases of a mismatch (default:20).
-n <integer>	(neighborhood scores:) minimal average score of the 10 neighboring bases of a mismatch (5 upstream, 5 downstream) (default:15).
-A	(all mismatches:) prevent filtering and display all mismatches (default: use default filtering, see above).

OUTPUT OPTIONS:

-o <basename_of_outfiles>	(outfile:) files to which the reports should be appended (default: <basename_of_infile>.alignments.csv and <basename_of_infile>.mismatches.csv).
-d	(description:) write a descriptive headline to the report files (default: no headline).

Output files

columns in the alignments file

ID	Alignment ID.
ALN_LEN	Length of the aligned sequences including the gaps.
MISM	All differences between the aligned sequences.
SNVS	Single Nucleotide Variations = single-base SNPs.
SNV_B	Single Nucleotide Variation bases.
MNVS	Multiple Nucleotide Variations = multi-base SNPs.
MNV_B	Multiple Nucleotide Variation bases.
S_IND	Single-base InDels.
S_IND_B	Single-base InDel bases.
M_IND	Multi-base InDels.
M_IND_B	Multi-base InDel bases.
P_PERC	Percent polymorphic per aligned bases
STATUS	If the alignment is OK the status is 1, otherwise 0.
ERROR	Error code: -42: sequences of the multiple fasta file are not equal in length. -605: alignment too short. -602: not enough sequences in the alignment. -656: fraction of mismatches too high.

columns in the mismatches files

ID	Alignment ID.
TYPE	Mismatch type (SNV, MNV, InDel, Mixed).
ALN_S	Position of the first base of a polymorphism in the alignment.
PS_CONS	Consensus sequence of the polymorphism represented in ambiguity code.
PS_LEN	Length of the polymorphism.
CONS	Consensus sequence of 100 bp upstream and 100 bp downstream the polymorphism.
MINBORD	Minimal distance of the polymorphism to the start or the end of the alignment.
PSAVGSC	Average score of the polymorphis site.
NGAVGSC	Average score of the neighboring 2x5 bases.
N_COUNT	Number of N's in the consensus sequence.
STATUS	Result of 3 tests: 1: length of the polymorphism <= maximal polymorphism length (= binary 1), 2: distance to the start or the end of the alignment > minimal border distance (= binary 10 = decimal 2), 3: Average score of the neighboring 2x5 bases >= cutoff and average score of the polymorphis site >= cutoff (= binary 100 = decimal 4) All 3 tests passed successfully accounts for binary 111 = decimal 7.

Examples

Processing one multiple alignment (e.g. "example.mfa" and "example.qual" in the directory "test1" in the downloaded archive):

mmfind -d test1/example.mfa
or


                cd test1 

                mmfind -d example.mfa

In both cases two files ("example.alignments.csv" and "example.mismatches.csv") are written to the directory from where the script is called. The '-d'-option is not required, but responsible for a descriptive header line in the output files.

Processing of multiple alignments (e.g. files in the directory "test2" in the downloaded archive):

mmfind -d -o test2/summary test2

With the -o option the result of the evaluation is not written to separate files for each alignment, but summarized in the two files "summary.alignments.csv" and "summary.mismatches.csv". Only fasta files with the following extension are evaluated: fa, fasta, fas or mfa.

For a more user friendly alignment format use the -c option. For each multiple fasta alignment an appropriate clustal-like version will be written.