This module is designed to use Minimap alignments in PAF format (with the --cs flag) to replace blastn for all-by-all
genome assembly comparisons for PAGSAT etc.
Input is a PAF file with --cs flag, produced by minimap:
minimap2 --cs -N 100 -p 0.0001 -x asm20 $REFERENCE $ASSEMBLY > $PAFILE
pafin=minimap will run minimap2 and generate $BASEFILE.paf output, which will then be processed.
Output is a series of tables that emulate GABLAM output:
hitsum.tdt = summary of hits per query sequence.
gablam.tdt = summary stats for each query-hit pair.
local.tdt = local hits table. Will include alignments if
qryunique.tdt = local hits table reduced to unique coverage of query ($ASSEMBLY) sequences (
hitunique.tdt = local hits table reduced to unique coverage of hit ($REFERENCE) sequences (
NOTE: When Snapper runs rje_paf it inverts the reference and assembly so that the query is the reference.
The default functionality is to emulate GABLAM (and thereby replace blast+ in GABLAM with the
However, excess fields can be reduced with
mockblast=T. Additional fields are output either way.
pafin=PAFFILE : PAF generated from $REFERENCE $ASSEMBLY mapping 
basefile=STR : Base for file outputs [
seqin=FASFILE : Input genome to identify variants in ($ASSEMBLY) 
reference=FILE : Fasta (with accession numbers matching Locus IDs) ($REFERENCE) 
uniqueout=T/F : Whether to output *.qryunique.tdt and *.hitunique.tdt tables of unique coverage [
uniquehit=T/F : Option to use *.hitunique.tdt table of unique coverage for GABLAM coverage stats [
alnseq=T/F : Whether to use alnseq-based processing (True) or CS-Gstring processing [
localaln=T/F : Whether to output local alignments in Local Table [
mockblast=T/F : Whether to output mock BLAST headers even when not appropriate [
Minimap2 run/mapping options
minlocid=PERC : Minimum percentage identity for aligned chunk to be kept (local %identity) [
minloclen=INT : Minimum length for aligned chunk to be kept (local hit length in bp) [
endextend=X : Extend minimap2 hits to end of sequence if query region with X bp of end [
minimap2=PROG : Full path to run minimap2 [
mapopt=CDICT : Dictionary of minimap2 options [
mapsplice=T/F : Switch default minimap2 options to
-x splice -uf -C5 [
reads=FILELIST : List of fasta/fastq read files for minimap2 mapping to BAM. Wildcard allowed. Can be gzipped. 
readtype=LIST : List of ont/pb/hifi file types matching reads for minimap2 mapping [
Coverage checking mode (dev only)
checkpos=TDTFILE: File of Locus, Start, End positions for read coverage checking [
checkfields=LIST: Fields in checkpos file to give Locus, Start and End for checking [
checkflanks=LIST: List of lengths flanking check regions that must also be spanned by reads [
spanid=X : Generate sets of read IDs that span checkpos regions, grouped by values of field X 
Variant mode (dev only)
snptableout=T/F : Generated output of filtered variants to SNP Table [
qcut=X : Min. quality score for a mapped read to be included [
minqn=X : Min. number of non-N reads meeting qcut for output (after indel filtering) [
rid=T/F : Whether to include Read ID (number) lists for each allele [
readnames=T/F : Output the read names to the RID file [
indels=T/F : Whether to include indels in "SNP" parsing [
skiploci=LIST : List of loci to exclude from pileup parsing (e.g. mitochondria) 
mincut=X : Minimum read count for minor allele (proportion if <1) [
absmincut=X : Absolute minimum read count for minor allele (used if mincut<1) [
biallelic=T/F : Whether to restrict SNPs to pure biallelic SNPs (two alleles meeting mincut) [
ignoren=T/F : Whether to exclude "N" calls from alleles [
ignoreref=T/F : If False will always keep Reference allele and call fixed change as SNP [
forks=X : Number of parallel sequences to process at once [
killforks=X : Number of seconds of no activity before killing all remaining forks. [
forksleep=X : Sleep time (seconds) between cycles of forking out more process [
History Module Version History
# 0.0.0 - Initial Compilation.
# 0.1.0 - Initial working version. Compatible with GABLAM v2.30.0 and Snapper v1.7.0.
# 0.2.0 - Added endextend=X : Extend minimap2 hits to end of sequence if with X bp 
# 0.3.0 - Added mapsplice mode for dealing with transcript mapping.
# 0.3.1 - Correct PAF splicing bug.
# 0.4.0 - Added TmpDir and forking for GABLAM conversion.
# 0.5.0 - Added uniquehit=T/F : Option to use *.hitunique.tdt table of unique coverage for GABLAM coverage stats [False]
# 0.6.0 - Added CS alignment manipulation methods.
# 0.6.1 - Added additional error-handling for CS parsing errors.
# 0.7.0 - Added alnseq=T/F : Whether to use alnseq-based processing (True) or CS-Gstring processing (dev only) [False]
# 0.7.1 - Disabled endextend due to bug.
# 0.7.2 - Fixed alnlen bug - minimap2 ignore Ns for alignment length calculation. Fixed endextend bug.
# 0.7.3 - Added minlocid and minloclen filtering to PAF parsing to speed up processing. Set default alnseq=F.
# 0.8.0 - Added developmental variant calling options.
# 0.9.0 - Added long-read mapping to BAM option.
# 0.10.0 - Added longreadMinimapPAF() and checkpos=TDTFILE options.
# 0.10.1 - Added spanid=X: Generate sets of read IDs that span checkpos regions, based on values of field X 
# 0.10.2 - Fixed formatting for Python 2.6 back compatibility for servers.
# 0.10.3 - Fixing issues of PAF files not being generated.
# 0.11.0 - Added HiFi read type.
# 0.12.0 - Added readnames=T/F : Output the read names to the RID file [False]
# 0.13.0 - Updated default -x to asm20 (5% divergence). Use minimap2 map-hifi setting for hifi read mapping.
rje_paf REST Output formats
for program documentation and options. A plain text version is accessed with
can be used to retrieve individual parts of the output, matching the tabs in the default
) output. Individual
elements can also be parsed from the full (
) server output,
which is formatted as follows:
... contents for OUTFMT section ...
Available REST Outputs
There is currently no specific help available on REST output for this program.