taxolotl V0.1.1

Taxolotl genome assembly taxonomy summary and assessment tool

Module: taxolotl
Description: Taxolotl genome assembly taxonomy summary and assessment tool
Version: 0.1.1
Last Edit: 25/11/21

Copyright © 2021 Richard J. Edwards - See source code for GNU License Notice

Imported modules: rje rje_db rje_gff rje_obj rje_rmd rje_seqlist rje_kat saaga

See SLiMSuite Blog for further documentation. See rje for general commands.


Taxolotl combines the MMseqs2 easy-taxonomy with GFF parsing to perform taxonomic analysis of a genome assembly (and any subsets given by taxsubsets=LIST) using an annotated proteome. Taxonomic assignments are mapped onto genes as well as assembly scaffolds and (if assembly=FILE is given) contigs.

See <> for details. General SLiMSuite run documentation can be found at <>.

Taxolotl is available as part of SLiMSuite, or via a standalone GitHub repo at <>.


Input/Output options

seqin=FILE : Protein annotation file to assess [annotation.faa]
gffin=FILE : Protein annotation GFF file [annotation.gff]
cdsin=FILE : Optional transcript annotation file for renaming and/or longest isoform extraction [annotation.fna]
assembly=FILE : Optional genome fasta file (required for some outputs) [None]
basefile=X : Prefix for output files [$SEQBASE]
gffgene=X : Label for GFF gene feature type ['gene']
gffcds=X : Label for GFF CDS feature type ['CDS']
gffmrna=X : Label for GFF mRNA feature type ['mRNA']
taxlevels=LIST : List of taxonomic levels to report (* for superkingdom and below) ['*']

Run mode options

dochtml=T/F : Generate HTML Taxolotl documentation (*.docs.html) instead of main run [False]

Taxolotl options

taxdb=FILE : MMseqs2 taxonomy database for taxonomy assignment [seqTaxDB]
taxbase=X : Output prefix for taxonomy output [$SEQBASE.$TAXADB]
taxorfs=T/F : Whether to generate ORFs from assembly if no seqin=FILE given [True]
taxbyseq=T/F : Whether to parse and generate taxonomy output for each assembly (GFF) sequence [True]
taxbycontig=T/F : Whether to generate taxonomy output for each contig if the assembly is loaded [True]
taxbyseqfull=T/F: Whether generate full easy taxonomy report outputs for each assembly (GFF) sequence [False]
taxsubsets=FILELIST : Files (fasta/id) with sets of assembly input sequences (matching GFF) to summarise []
taxwarnrank=X : Taxonomic rank (and above) to warn when deviating for consensus [family]
bestlineage=T/F : Whether to enforce a single lineage for best taxa ratings [True]
mintaxnum=INT : Minimum gene count in main dataset to keep taxon, else merge with higher level [2]

TabReport options

tabreport=FILE : Convert MMseqs2 report into taxonomy table with counts (if True use taxbase=X) [None]
taxhigh=X : Highest taxonomic level for tabreport [class]
taxlow=X : Lowest taxonomic level for tabreport [species]
taxpart=T/F : Whether to output entries with partial taxonomic levels to tabreport [False]

System options

forks=X : Number of parallel sequences to process at once [0]
killforks=X : Number of seconds of no activity before killing all remaining forks. [36000]
forksleep=X : Sleep time (seconds) between cycles of forking out more process [0]
tmpdir=PATH : Temporary directory path for running mmseqs2 [./tmp/]

History Module Version History

    # 0.0.0 - Initial Compilation.
    # 0.1.0 - Added tabreport function.
    # 0.1.1 - Fix bug with contig output. Added seqname, start and end to contig summary.

