SLiMSuite REST Server

EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
REST Pages
REST Status
REST Tools
REST Alias Data
REST Sitemap

PAGSAT V1.12.0

Pairwise Assembled Genome Sequence Analysis Tool

Module: PAGSAT
Description: Pairwise Assembled Genome Sequence Analysis Tool
Version: 1.12.0
Last Edit: 26/09/16

Copyright © 2015 Richard J. Edwards - See source code for GNU License Notice

Imported modules: rje rje_db rje_genbank rje_html rje_obj rje_seqlist rje_sequence rje_synteny rje_tree rje_tree_group rje_xref rje_blast_V2 rje_dismatrix_V3 gablam snapper

See SLiMSuite Blog for further documentation. See rje for general commands.


This module is for the assessment of an assembled genome versus a suitable reference. For optimal results, the reference genome will be close to identical to that which should be assembled. However, comparative analyses should still be useful when different assemblies are run against a related genome - although there will not be the same expectation for 100% coverage and accuracy, inaccuracies would still be expected to make an assembly less similar to the reference.

Main input for PAGSAT is an assembled genome in fasta format (assembly=FILE) and a reference genome in fasta format (refgenome=FILE or reference=FILE) with corresponding *.gb or *.gbk genbank download for feature extraction.


Main output is a number of delimited text files and PNG graphics made with R. Details to follow.


Input/Setup Options

assembly=FILE : Fasta file of assembled contigs to assess [None]
refgenome=FILE : Fasta file of reference genome for assessment (also *.gb for full functionality) [None]
spcode=X : Species code for reference genome (if not already processed by rje_genbank) [None]
minqv=X : Minimum mean QV score for assembly contigs (read from *.qv.csv) [20]
mincontiglen=X : Minimum contig length to retain in assembly (QV filtering only) [1000]
casefilter=T/F : Whether to filter leading/trailing lower case (low QV) sequences [True]

Reference vs Assembly Options

minlocid=X : Minimum percentage identity for local hits mapping to chromosome coverage [95.0]
minloclen=X : Mininum length for local hits mapping to chromosome coverage [250]
genesummary=T/F : Whether to include reference gene searches in summary data [True]
protsummary=T/F : Whether to include reference protein searches in summary data [True]
tophitbuffer=X : Percentage identity difference to keep best hits for reference genes/proteins. [1.0]
diploid=T/F : Whether to treat assembly as a diploid [False]

Output Options

basefile=X : Basename for output files and directories. [assembly+ref]
chromalign=T/F : Whether to perform crude chromosome-contig alignment [True]
rgraphics=T/F : Whether to generate PNG graphics using R. (Needs R installed and setup) [True]
dotplots=T/F : Whether to use gablam.r to output dotplots for all ref vs assembly. [False]
report=T/F : Whether to generate HTML report [True]
genetar=T/F : Whether to tar and zip the GeneHits/ and ProtHits/ folders (if generated & Mac/Linux) [True]

Comparison Options

compare=FILES : Compare assemblies selected using a list of *.Summary.tdt files (wildcards allowed). []
fragcov=LIST : List of coverage thresholds to count min. local BLAST hits (checks integrity) [50,90,95,99]
chromcov=LIST : Report no. of chromosomes covered by a single contig at different %globID (GABLAM table) [95,98,99]

Assembly Tidy/Edit Options

tidy=T/F : Execute semi-automated assembly tidy/edit mode to complete draft assembly [False]
newacc=X : New base for edited contig accession numbers (None will keep old accnum) [None]
newchr=X : Code to replace "chr" in new sequence names for additional PAGSAT compatibility [ctg]
orphans=T/F : Whether to include and process orphan contigs [True]
chrmap=X : Contig:Chromosome mapping mode for assembly tidy (unique/align) [unique]
joinsort=X : Whether to sort potential chromosome joins by Length or Identity [Identity]
joinmerge=X : Merging mode for joining chromosomes (consensus/end) [end]
joinmargin=X : Number of extra bases allowed to still be considered an end local BLAST hit [10]
snapper=T/F : Run Snapper on ctidX/haploid output following PAGSAT Tidy. (Re-Quiver recommended first.) [False]

History Module Version History

    # 1.0.0 - Initial working version for based on rje_pacbio assessment=T.
    # 1.1.0 - Fixed bug with gene and protein summary data. Removed gene/protein reciprocal searches. Added compare mode.
    # 1.1.1 - Added PAGSAT output directory for tidiness!
    # 1.1.2 - Renamed the PacBio class PAGSAT.
    # 1.2.0 - Tidied up output directories. Added QV filter and Top Gene/Protein hits output.
    # 1.2.1 - Added casefilter=T/F  : Whether to filter leading/trailing lower case (low QV) sequences [True]
    # 1.3.0 - Added tophitbuffer=X and initial synteny analysis for keeping best reference hits.
    # 1.4.0 - Added chrom-v-contig alignment files along with *.ordered.fas.
    # 1.4.1 - Made default chromalign=T.
    # 1.4.2 - Fixed casefilter=F.
    # 1.5.0 - diploid=T/F     : Whether to treat assembly as a diploid [False]
    # 1.6.0 - mincontiglen=X  : Minimum contig length to retain in assembly [1000]
    # 1.6.1 - Added diploid=T/F to R PNG call.
    # 1.7.0 - Added tidy=T/F option. (Development)
    # 1.7.1 - Updated tidy=T/F to include initial assembly.
    # 1.7.2 - Fixed some bugs introduced by changing gablam fragment output.
    # 1.7.3 - Added circularise sequence generation.
    # 1.8.0 - Added orphan processing and non-chr naming of Reference.
    # 1.9.0 - Modified the join sorting and merging. Added better tracking of positions when trimming.
    # 1.9.1 - Added joinmargin=X    : Number of extra bases allowed to still be considered an end local BLAST hit [10]
    # 1.10.0 - Added weighted tree output and removed report warning.
    # 1.10.1 - Fixed issue related to having Description in GABLAM HitSum tables.
    # 1.10.2 - Tweaked haploid core output.
    # 1.10.3 - Fixed tidy bug for RevComp contigs and switched joinsort default to Identity. (Needs testing.)
    # 1.10.4 - Added genetar option to tidy out genesummary and protsummary output. Incorporated rje_synteny.
    # 1.10.5 - Set gablamfrag=1 for gene/protein hits.
    # 1.11.0 - Consolidated automated tidy mode and cleaned up some excess code.
    # 1.11.1 - Added option for running self-PAGSAT of ctidX contigs versus haploid set. Replaced ctid "X" with "N".
    # 1.11.2 - Fixed Snapper run choice bug.
    # 1.11.3 - Added reference=FILE as alias for refgenome=FILE. Fixed orphan delete bug.
    # 1.12.0 - Tidying up and documenting outputs. Changed default minloclen=250 and minlocid=95. (LTR identification.)

PAGSAT REST Output formats

Run with &rest=help for general options. Run with &rest=full to get full server output as text or &rest=format
for more user-friendly formatted output. Individual outputs can be identified/parsed using &rest=OUTFMT for:

coverage = main results table

© 2015 RJ Edwards. Contact: