Login Logged in as anonymous / My BiBiServ / Logout
Navigation
mmfind
Welcome
Download
Manual
References

Usage

Type "mmfind" or "mmfind -h" to get information about the program and the options which modify the behaviour of the program:

mmfind [options] <multiple_fasta_file>

Evaluates an alignment in multiple FASTA format for mismatches. Quality scores are considered if a file with scores is supplied (the file should have the same name but a "qual"-extension; example: test.fa and test.qual). Scores should be in FASTA format and will be mapped on the aligned sequences. mmfind is a command- line tool written in Python. It is tested with Python 2.6, 2.7 and 3.1.

FILTERING
-a <integer> (aligned:) minimal number of aligned sequences (default:2). The alignment will not be processed in case it contains less sequences (error code: -602).
-L <integer>(length:) minimal alignment length (default:200, error code: -605).
-p <integer>(polymorphism cutoff:) ignore alignments with percent mismatches per length exceeding given value (default:3, error code: -656).
-l <integer>(length:) maximal length of mismatches to be reported (default:3).
-b <integer>(border distance:) minimal distance of a mismatch to the alignment ends to be reported (default:80).
-s <integer>(score:) minimal average score of the bases of a mismatch (default:20).
-n <integer>(neighborhood scores:) minimal average score of the 10 neighboring bases of a mismatch (5 upstream, 5 downstream) (default:15).
-A(all mismatches:) prevent filtering and display all mismatches (default: use default filtering, see above).
OUTPUT OPTIONS:
-o <basename_of_outfiles>(outfile:) files to which the reports should be appended (default: <basename_of_infile>.alignments.csv and <basename_of_infile>.mismatches.csv).
-d(description:) write a descriptive headline to the report files (default: no headline).

Output files

columns in the alignments file
IDAlignment ID.
ALN_LENLength of the aligned sequences including the gaps.
MISMAll differences between the aligned sequences.
SNVSSingle Nucleotide Variations = single-base SNPs.
SNV_BSingle Nucleotide Variation bases.
MNVSMultiple Nucleotide Variations = multi-base SNPs.
MNV_BMultiple Nucleotide Variation bases.
S_INDSingle-base InDels.
S_IND_BSingle-base InDel bases.
M_INDMulti-base InDels.
M_IND_BMulti-base InDel bases.
P_PERCPercent polymorphic per aligned bases
STATUSIf the alignment is OK the status is 1, otherwise 0.
ERRORError code:
-42: sequences of the multiple fasta file are not equal in length.
-605: alignment too short.
-602: not enough sequences in the alignment.
-656: fraction of mismatches too high.
columns in the mismatches files
IDAlignment ID.
TYPEMismatch type (SNV, MNV, InDel, Mixed).
ALN_SPosition of the first base of a polymorphism in the alignment.
PS_CONSConsensus sequence of the polymorphism represented in ambiguity code.
PS_LENLength of the polymorphism.
CONSConsensus sequence of 100 bp upstream and 100 bp downstream the polymorphism.
MINBORDMinimal distance of the polymorphism to the start or the end of the alignment.
PSAVGSCAverage score of the polymorphis site.
NGAVGSCAverage score of the neighboring 2x5 bases.
N_COUNTNumber of N's in the consensus sequence.
STATUSResult of 3 tests:
1: length of the polymorphism <= maximal polymorphism length (= binary 1),
2: distance to the start or the end of the alignment > minimal border distance (= binary 10 = decimal 2),
3: Average score of the neighboring 2x5 bases >= cutoff and average score of the polymorphis site >= cutoff (= binary 100 = decimal 4)
All 3 tests passed successfully accounts for binary 111 = decimal 7.

Examples

Processing one multiple alignment (e.g. "example.mfa" and "example.qual" in the directory "test1" in the downloaded archive):
mmfind -d test1/example.mfa
or
cd test1
mmfind -d example.mfa

In both cases two files ("example.alignments.csv" and "example.mismatches.csv") are written to the directory from where the script is called. The '-d'-option is not required, but responsible for a descriptive header line in the output files.

Processing of multiple alignments (e.g. files in the directory "test2" in the downloaded archive):
mmfind -d -o test2/summary test2

With the -o option the result of the evaluation is not written to separate files for each alignment, but summarized in the two files "summary.alignments.csv" and "summary.mismatches.csv". Only fasta files with the following extension are evaluated: fa, fasta, fas or mfa.

For a more user friendly alignment format use the -c option. For each multiple fasta alignment an appropriate clustal-like version will be written.