|
SAAGA V0.7.7Summarise, Annotate & Assess Genome Annotations
Copyright © 2020 Richard J. Edwards - See source code for GNU License Notice Imported modules:
See SLiMSuite Blog for further documentation. See FunctionSAAGA is a tool for summarising, annotating and assessing genome annotations, with a particular focus on annotation generated by GeMoMa. The core of SAAGA is reciprocal MMeqs searches of the annotation and reference proteomes. These are used to identify the best hits for protein product identification and to assess annotations based on query and hit coverage. SAAGA will also generate annotation summary statistics, and extract the longest protein from each gene for a representative non-redundant proteome (e.g. for BUSCO analysis). Run modes
assess = Assess annotation using reference annotation (e.g. a reference organism proteome) CommandlineInput/Output options
Run mode options
Search and filter options
Precomputed MMSeq2 options
Batch Run options
Taxonomy options
System options
History Module Version History# 0.0.0 - Initial Compilation. # 0.1.0 - Initial working version. Needs improved documentation. # 0.2.0 - Added extra annotation/longest output for CDS and GFF. # 0.2.1 - Renamed to SAAGA and tidied some documentation. # 0.3.0 - Added some additional hit info to annotation and reworked to allow multiple query-hit pairs. # 0.3.1 - Fixed assess bug and sped up GFF parsing. # 0.4.0 - Added tophits=X [250] and minglobid=X [40.0] options, plus gobid and hitnum to output. # 0.5.0 - Added definitions for gffgene=X, gffcds=X and gffmrna=X. Modified output. # 0.5.1 - Tidied some of the code and added some identifier checks for GFF and Fasta input. # 0.5.2 - Fixed issue with swapped transcript and exon feature identifiers following v0.5.1 tidying. # 0.5.3 - Added pident compatibility with updated mmseq2. Updated documentation. Modified some stats calculations. # 0.5.4 - Added restricted feature parsing from GFF. Fixed GFF type input bug. # 0.6.0 - Added more graceful failure if no sequences loaded. Added GFF renaming output field options. Fixed GFF output bug. # 0.7.0 - Added taxonomy mode for taxonomic summaries and contamination checks. # 0.7.1 - Added taxorfs setting to generate ORFs in absence of GFF or protein file. # 0.7.2 - Updated docstring. Added rating to lca_genes. Add batchrun for matching seqin/gffin pairs. Added GFF output. # 0.7.3 - Fixed lca_genes rating and added taxbycontig=T/F taxonomy output for each contig if the assembly is loaded. # 0.7.4 - Updated some of the outputs to Taxolotl rather than SAAGA. # 0.7.5 - Added bestlineage=T/F : Whether to enforce a single lineage for best taxa ratings [True] # 0.7.6 - Fixed GFF output. # 0.7.7 - Fixed contig output for Taxolotl. SAAGA REST Output formatsRun with&rest=docs for program documentation and options. A plain text version is accessed with &rest=help .&rest=OUTFMT can be used to retrieve individual parts of the output, matching the tabs in the default( &rest=format ) output. Individual OUTFMT elements can also be parsed from the full (&rest=full ) server output,which is formatted as follows: ###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~### # OUTFMT: ... contents for OUTFMT section ... Available REST OutputsThere is currently no specific help available on REST output for this program.© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au. |