SLiMSuite REST Server

EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
REST Pages
REST Status
REST Tools
REST Alias Data
REST Sitemap

taxolotl V0.1.1

Taxolotl genome assembly taxonomy summary and assessment tool

Module: taxolotl
Description: Taxolotl genome assembly taxonomy summary and assessment tool
Version: 0.1.1
Last Edit: 25/11/21

Copyright © 2021 Richard J. Edwards - See source code for GNU License Notice

Imported modules: rje rje_db rje_gff rje_obj rje_rmd rje_seqlist rje_kat saaga

See SLiMSuite Blog for further documentation. See rje for general commands.


Taxolotl combines the MMseqs2 easy-taxonomy with GFF parsing to perform taxonomic analysis of a genome assembly (and any subsets given by taxsubsets=LIST) using an annotated proteome. Taxonomic assignments are mapped onto genes as well as assembly scaffolds and (if assembly=FILE is given) contigs.

See <> for details. General SLiMSuite run documentation can be found at <>.

Taxolotl is available as part of SLiMSuite, or via a standalone GitHub repo at <>.


Input/Output options

seqin=FILE : Protein annotation file to assess [annotation.faa]
gffin=FILE : Protein annotation GFF file [annotation.gff]
cdsin=FILE : Optional transcript annotation file for renaming and/or longest isoform extraction [annotation.fna]
assembly=FILE : Optional genome fasta file (required for some outputs) [None]
basefile=X : Prefix for output files [$SEQBASE]
gffgene=X : Label for GFF gene feature type ['gene']
gffcds=X : Label for GFF CDS feature type ['CDS']
gffmrna=X : Label for GFF mRNA feature type ['mRNA']
taxlevels=LIST : List of taxonomic levels to report (* for superkingdom and below) ['*']

Run mode options

dochtml=T/F : Generate HTML Taxolotl documentation (*.docs.html) instead of main run [False]

Taxolotl options

taxdb=FILE : MMseqs2 taxonomy database for taxonomy assignment [seqTaxDB]
taxbase=X : Output prefix for taxonomy output [$SEQBASE.$TAXADB]
taxorfs=T/F : Whether to generate ORFs from assembly if no seqin=FILE given [True]
taxbyseq=T/F : Whether to parse and generate taxonomy output for each assembly (GFF) sequence [True]
taxbycontig=T/F : Whether to generate taxonomy output for each contig if the assembly is loaded [True]
taxbyseqfull=T/F: Whether generate full easy taxonomy report outputs for each assembly (GFF) sequence [False]
taxsubsets=FILELIST : Files (fasta/id) with sets of assembly input sequences (matching GFF) to summarise []
taxwarnrank=X : Taxonomic rank (and above) to warn when deviating for consensus [family]
bestlineage=T/F : Whether to enforce a single lineage for best taxa ratings [True]
mintaxnum=INT : Minimum gene count in main dataset to keep taxon, else merge with higher level [2]

TabReport options

tabreport=FILE : Convert MMseqs2 report into taxonomy table with counts (if True use taxbase=X) [None]
taxhigh=X : Highest taxonomic level for tabreport [class]
taxlow=X : Lowest taxonomic level for tabreport [species]
taxpart=T/F : Whether to output entries with partial taxonomic levels to tabreport [False]

System options

forks=X : Number of parallel sequences to process at once [0]
killforks=X : Number of seconds of no activity before killing all remaining forks. [36000]
forksleep=X : Sleep time (seconds) between cycles of forking out more process [0]
tmpdir=PATH : Temporary directory path for running mmseqs2 [./tmp/]

History Module Version History

    # 0.0.0 - Initial Compilation.
    # 0.1.0 - Added tabreport function.
    # 0.1.1 - Fix bug with contig output. Added seqname, start and end to contig summary.

© 2015 RJ Edwards. Contact: