Uploading data and parameter setting
Input files to acdc are genome assemblies, i.e. fasta files. In the first step of acdc web
application, please upload a file you wish to detect contamination for. The next step lets you
choose between a simple and an expert operation mode. Both modes allows you to choose the kraken database
to be used. The (preselected) Mini-kraken database needs less resources and therefore decreases
scheduling and running time. In expert mode, it is possible to set three additional parameters:
- Minimum contig length: Set this to a value greater zero if you have a large number of very small
contigs in your assembly. This reduces computation time by not including contigs smaller than the
specified length.
- Target number of data points: Number of data points for estimating the window parameters.
A larger number of data point will result in a smaller window length and vice versa. This gives
control over the granularity of oligonucleotide signatures. The specified number is only a target
and the actual number of data points is approximated near to that number.
- Aggressive threshold: This threshold controls the minimum size (in bp) of detected clusters. All
clusters smaller than the specified threshold are treated as outliers and are not considered
contamination.
Results
Results are provided as an integrated web interface as well as an archive containing the same
information. The result interface is divided into two sections: The left table contains all
assemblies for which acdc was run. In the web application, there is always only one sample visible.
For a batch computation, please use the command line application. On the right side of the
interface, the visualization plot of the sample is shown. A click on any of the cells in the table
will update the visualization accordingly.
The table is divided into 4-5 columns. The first and second columns show the sample ID and sample
file name. The third and fourth column show detection confidences as calculated by connected
components and the dip statistic, respectively. An optional fifth column shows output of an external
sequence classification tool, which is Kraken. Please refer to the paper for details.
The visualization shows data points where one data point corresponds to one window at a specific
position of the contig. Depending on the selected table cell, hovering data points will either show
the corresponding contig name or classification. An orange star in the scatter plot indicates
a detected 16S rRNA gene. Click on it do display the gene sequence.
Exporting individual clusters is possible by clicking any of the colored circles below the plot. The
colors correspond to the colors of the cluster in the plot. A click on a circle will download the
according fasta file of the cluster.
The top half of the visualization contains a number of controls over the display of the plot which
should be self-explanatory. The paper provides further information.