Pairwise Assembled Genome Sequence Analysis Tool
Copyright © 2015 Richard J. Edwards - See source code for GNU License Notice
This module is for the assessment of an assembled genome versus a suitable reference. For optimal results, the reference genome will be close to identical to that which should be assembled. However, comparative analyses should still be useful when different assemblies are run against a related genome - although there will not be the same expectation for 100% coverage and accuracy, inaccuracies would still be expected to make an assembly less similar to the reference.
Main input for PAGSAT is an assembled genome in fasta format (
Main output is a number of delimited text files and PNG graphics made with R. Details to follow.
Reference vs Assembly Options
Assembly Tidy/Edit Options
History Module Version History
# 1.0.0 - Initial working version for based on rje_pacbio assessment=T. # 1.1.0 - Fixed bug with gene and protein summary data. Removed gene/protein reciprocal searches. Added compare mode. # 1.1.1 - Added PAGSAT output directory for tidiness! # 1.1.2 - Renamed the PacBio class PAGSAT. # 1.2.0 - Tidied up output directories. Added QV filter and Top Gene/Protein hits output. # 1.2.1 - Added casefilter=T/F : Whether to filter leading/trailing lower case (low QV) sequences [True] # 1.3.0 - Added tophitbuffer=X and initial synteny analysis for keeping best reference hits. # 1.4.0 - Added chrom-v-contig alignment files along with *.ordered.fas. # 1.4.1 - Made default chromalign=T. # 1.4.2 - Fixed casefilter=F. # 1.5.0 - diploid=T/F : Whether to treat assembly as a diploid [False] # 1.6.0 - mincontiglen=X : Minimum contig length to retain in assembly  # 1.6.1 - Added diploid=T/F to R PNG call. # 1.7.0 - Added tidy=T/F option. (Development) # 1.7.1 - Updated tidy=T/F to include initial assembly. # 1.7.2 - Fixed some bugs introduced by changing gablam fragment output. # 1.7.3 - Added circularise sequence generation. # 1.8.0 - Added orphan processing and non-chr naming of Reference. # 1.9.0 - Modified the join sorting and merging. Added better tracking of positions when trimming. # 1.9.1 - Added joinmargin=X : Number of extra bases allowed to still be considered an end local BLAST hit  # 1.10.0 - Added weighted tree output and removed report warning. # 1.10.1 - Fixed issue related to having Description in GABLAM HitSum tables. # 1.10.2 - Tweaked haploid core output. # 1.10.3 - Fixed tidy bug for RevComp contigs and switched joinsort default to Identity. (Needs testing.) # 1.10.4 - Added genetar option to tidy out genesummary and protsummary output. Incorporated rje_synteny. # 1.10.5 - Set gablamfrag=1 for gene/protein hits. # 1.11.0 - Consolidated automated tidy mode and cleaned up some excess code. # 1.11.1 - Added option for running self-PAGSAT of ctidX contigs versus haploid set. Replaced ctid "X" with "N". # 1.11.2 - Fixed Snapper run choice bug. # 1.11.3 - Added reference=FILE as alias for refgenome=FILE. Fixed orphan delete bug. # 1.12.0 - Tidying up and documenting outputs. Changed default minloclen=250 and minlocid=95. (LTR identification.)
PAGSAT REST Output formatsRun with
for more user-friendly formatted output. Individual outputs can be identified/parsed using
© 2015 RJ Edwards. Contact: email@example.com.