Pairwise Assembled Genome Sequence Analysis Tool
Copyright © 2015 Richard J. Edwards - See source code for GNU License Notice
See SLiMSuite Blog for further documentation. See
This module is for the assessment of an assembled genome versus a suitable reference. For optimal results, the reference genome will be close to identical to that which should be assembled. However, comparative analyses should still be useful when different assemblies are run against a related genome - although there will not be the same expectation for 100% coverage and accuracy, inaccuracies would still be expected to make an assembly less similar to the reference.
Main input for PAGSAT is an assembled genome in fasta format (
Reference Sequence Naming
PAGSAT expects a particular naming format for assembly sequences, which is a bit more constrained that most programs.
Version 2.6 introduced a
Main output is a number of delimited text files and PNG graphics made with R. Details to follow.
NOTE: Snapper is now used for the underlying Reference vs Assembly GABLAM searches (unless
If the assembly has
Reference vs Assembly Options
Assembly Tidy/Edit Options
History Module Version History
# 1.0.0 - Initial working version for based on rje_pacbio assessment=T. # 1.1.0 - Fixed bug with gene and protein summary data. Removed gene/protein reciprocal searches. Added compare mode. # 1.1.1 - Added PAGSAT output directory for tidiness! # 1.1.2 - Renamed the PacBio class PAGSAT. # 1.2.0 - Tidied up output directories. Added QV filter and Top Gene/Protein hits output. # 1.2.1 - Added casefilter=T/F : Whether to filter leading/trailing lower case (low QV) sequences [True] # 1.3.0 - Added tophitbuffer=X and initial synteny analysis for keeping best reference hits. # 1.4.0 - Added chrom-v-contig alignment files along with *.ordered.fas. # 1.4.1 - Made default chromalign=T. # 1.4.2 - Fixed casefilter=F. # 1.5.0 - diploid=T/F : Whether to treat assembly as a diploid [False] # 1.6.0 - mincontiglen=X : Minimum contig length to retain in assembly  # 1.6.1 - Added diploid=T/F to R PNG call. # 1.7.0 - Added tidy=T/F option. (Development) # 1.7.1 - Updated tidy=T/F to include initial assembly. # 1.7.2 - Fixed some bugs introduced by changing gablam fragment output. # 1.7.3 - Added circularise sequence generation. # 1.8.0 - Added orphan processing and non-chr naming of Reference. # 1.9.0 - Modified the join sorting and merging. Added better tracking of positions when trimming. # 1.9.1 - Added joinmargin=X : Number of extra bases allowed to still be considered an end local BLAST hit  # 1.10.0 - Added weighted tree output and removed report warning. # 1.10.1 - Fixed issue related to having Description in GABLAM HitSum tables. # 1.10.2 - Tweaked haploid core output. # 1.10.3 - Fixed tidy bug for RevComp contigs and switched joinsort default to Identity. (Needs testing.) # 1.10.4 - Added genetar option to tidy out genesummary and protsummary output. Incorporated rje_synteny. # 1.10.5 - Set gablamfrag=1 for gene/protein hits. # 1.11.0 - Consolidated automated tidy mode and cleaned up some excess code. # 1.11.1 - Added option for running self-PAGSAT of ctidX contigs versus haploid set. Replaced ctid "X" with "N". # 1.11.2 - Fixed Snapper run choice bug. # 1.11.3 - Added reference=FILE as alias for refgenome=FILE. Fixed orphan delete bug. # 1.12.0 - Tidying up and documenting outputs. Changed default minloclen=250 and minlocid=95. (LTR identification.) # 2.0.0 - Major overhaul of outputs to improve consistency and clarity. Added Snapper to main run. # 2.1.0 - Added localSAM output. # 2.1.1 - Fixed the case of some output files. # 2.1.2 - Fixed some issues with reverse hits in Snapper and application of minlocid. # 2.2.0 - Added mapout=T, which is recommended for first run if going to subsequently tidy. (Run tidy on mapfile.) # 2.2.1 - Tried to fix covplot bug in compare=FILES mode. # 2.2.2 - Cleaned up *.map.* output for SAMPhaser output files. Added tidy/mapfas option selection. # 2.2.3 - Added #NOTE to tidy and fixed makesnp=T bug. # 2.2.4 - Fixed `fragrevcomp=F` bug for Gene and Protein TopHits. # 2.2.5 - Hopefully really fixed makesnp=T bug now! # 2.2.6 - Fixed Haploid tidy sequence output naming bug. # 2.2.7 - Fixed Compare File path bug & dropped some empty outputs. # 2.3.0 - Minor bug fixes and extra tidy options (join gaps and multi-deletes). # 2.3.1 - Minor bug fixes. # 2.3.2 - Updated the synteny mappings to be m::n instead of m:n for Excel compatibility. # 2.3.3 - Fixed bad assembly sequence name bug. # 2.3.4 - Fixed full.fas request bug. # 2.4.0 - Added PAGSAT compile mode to generate comparisons of reference chromosomes across assemblies. # 2.5.0 - Reduced the executed code when mapfas=T assessment=F. (Recommended first run.) Added renaming. # 2.5.1 - Added recognition of *.gbff for genbank files. # 2.6.0 - Added mapper=X : Program to use for mapping files against each other (blast/minimap) [blast] # 2.6.1 - Switch failure to find key report files to a long warning, not program exit. # 2.6.2 - Fixed bugs with mapper=minimap mode and started adding more internal documentation. # 2.6.3 - Fixed default behaviour to run report=T mode. # 2.6.4 - Fixed summary table merge bug. # 2.6.5 - Fixed compile path bug. # 2.6.6 - Fixed BLAST LocalIDCut error for GABLAM and QAssemble stat filtering. # 2.6.7 - Generalised compile path bug fix. # 2.6.8 - Added ChromXcov fields to PAGSAT Compare. # 2.6.9 - Fixed renamed assembly bug when basefile not set. # 2.7.0 - Added BAM generation for assembly if reads given. # 2.7.1 - Fixed bug that caused assembly PNGs to disappear. # 2.8.0 - Added keepchr=T/F : Keep the existing chromosome assigments during tidy if found [False] # 2.8.1 - Fixed bug that caused too many assembly PNGs to disappear!
PAGSAT REST Output formatsRun with
for more user-friendly formatted output. Individual outputs can be identified/parsed using
© 2015 RJ Edwards. Contact: email@example.com.