BiBiServ2 - acdc

Uploading data and parameter setting

Input files to acdc are genome assemblies, i.e. fasta files. In the first step of acdc web application, please upload a file you wish to detect contamination for. The next step lets you choose between a simple and an expert operation mode. Both modes allows you to choose the kraken database to be used. The (preselected) Mini-kraken database needs less resources and therefore decreases scheduling and running time. In expert mode, it is possible to set three additional parameters:

Minimum contig length: Set this to a value greater zero if you have a large number of very small contigs in your assembly. This reduces computation time by not including contigs smaller than the specified length.
Target number of data points: Number of data points for estimating the window parameters. A larger number of data point will result in a smaller window length and vice versa. This gives control over the granularity of oligonucleotide signatures. The specified number is only a target and the actual number of data points is approximated near to that number.
Aggressive threshold: This threshold controls the minimum size (in bp) of detected clusters. All clusters smaller than the specified threshold are treated as outliers and are not considered contamination.

Results

Results are provided as an integrated web interface as well as an archive containing the same information. The result interface is divided into two sections: The left table contains all assemblies for which acdc was run. In the web application, there is always only one sample visible. For a batch computation, please use the command line application. On the right side of the interface, the visualization plot of the sample is shown. A click on any of the cells in the table will update the visualization accordingly.

The table is divided into 4-5 columns. The first and second columns show the sample ID and sample file name. The third and fourth column show detection confidences as calculated by connected components and the dip statistic, respectively. An optional fifth column shows output of an external sequence classification tool, which is Kraken. Please refer to the paper for details.

The visualization shows data points where one data point corresponds to one window at a specific position of the contig. Depending on the selected table cell, hovering data points will either show the corresponding contig name or classification. An orange star in the scatter plot indicates a detected 16S rRNA gene. Click on it do display the gene sequence.

Exporting individual clusters is possible by clicking any of the colored circles below the plot. The colors correspond to the colors of the cluster in the plot. A click on a circle will download the according fasta file of the cluster.

The top half of the visualization contains a number of controls over the display of the plot which should be self-explanatory. The paper provides further information.