SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
Genomes
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

rje_paf V0.13.0

Minimap2 PAF parser and converter

Module: rje_paf
Description: Minimap2 PAF parser and converter
Version: 0.13.0
Last Edit: 01/09/21

Copyright © 2019 Richard J. Edwards - See source code for GNU License Notice


Imported modules: rje rje_forker rje_obj rje_db rje_seqlist rje_sequence


See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This module is designed to use Minimap alignments in PAF format (with the --cs flag) to replace blastn for all-by-all genome assembly comparisons for PAGSAT etc.

Input is a PAF file with --cs flag, produced by minimap:

minimap2 --cs -N 100 -p 0.0001 -x asm20 $REFERENCE $ASSEMBLY > $PAFILE

Setting pafin=minimap will run minimap2 and generate $BASEFILE.paf output, which will then be processed.

Output is a series of tables that emulate GABLAM output:

  • hitsum.tdt = summary of hits per query sequence.
  • gablam.tdt = summary stats for each query-hit pair.
  • local.tdt = local hits table. Will include alignments if localaln=T
  • qryunique.tdt = local hits table reduced to unique coverage of query ($ASSEMBLY) sequences (qryunique=T).
  • hitunique.tdt = local hits table reduced to unique coverage of hit ($REFERENCE) sequences (hitunique=T).

NOTE: When Snapper runs rje_paf it inverts the reference and assembly so that the query is the reference.

The default functionality is to emulate GABLAM (and thereby replace blast+ in GABLAM with the mapper=minimap flag). However, excess fields can be reduced with mockblast=T. Additional fields are output either way.

Commandline

Input/Output options

pafin=PAFFILE : PAF generated from $REFERENCE $ASSEMBLY mapping []
basefile=STR : Base for file outputs [PAFIN basefile]
seqin=FASFILE : Input genome to identify variants in ($ASSEMBLY) []
reference=FILE : Fasta (with accession numbers matching Locus IDs) ($REFERENCE) []
assembly=FASFILE: As seqin=FASFILE
uniqueout=T/F : Whether to output *.qryunique.tdt and *.hitunique.tdt tables of unique coverage [True]
uniquehit=T/F : Option to use *.hitunique.tdt table of unique coverage for GABLAM coverage stats [False]
alnseq=T/F : Whether to use alnseq-based processing (True) or CS-Gstring processing [False]
localaln=T/F : Whether to output local alignments in Local Table [False]
mockblast=T/F : Whether to output mock BLAST headers even when not appropriate [True]

Minimap2 run/mapping options

minlocid=PERC : Minimum percentage identity for aligned chunk to be kept (local %identity) [0]
minloclen=INT : Minimum length for aligned chunk to be kept (local hit length in bp) [0]
endextend=X : Extend minimap2 hits to end of sequence if query region with X bp of end [0]
minimap2=PROG : Full path to run minimap2 [minimap2]
mapopt=CDICT : Dictionary of minimap2 options [N:100,p:0.0001,x:asm20]
mapsplice=T/F : Switch default minimap2 options to -x splice -uf -C5 [False]
reads=FILELIST : List of fasta/fastq read files for minimap2 mapping to BAM. Wildcard allowed. Can be gzipped. []
readtype=LIST : List of ont/pb/hifi file types matching reads for minimap2 mapping [ont]

Coverage checking mode (dev only)

checkpos=TDTFILE: File of Locus, Start, End positions for read coverage checking [None]
checkfields=LIST: Fields in checkpos file to give Locus, Start and End for checking [Locus,Start,End]
checkflanks=LIST: List of lengths flanking check regions that must also be spanned by reads [0,100,1000,5000]
spanid=X : Generate sets of read IDs that span checkpos regions, grouped by values of field X []

Variant mode (dev only)

snptableout=T/F : Generated output of filtered variants to SNP Table [False]
qcut=X : Min. quality score for a mapped read to be included [0]
minqn=X : Min. number of non-N reads meeting qcut for output (after indel filtering) [10]
rid=T/F : Whether to include Read ID (number) lists for each allele [True]
readnames=T/F : Output the read names to the RID file [False]
indels=T/F : Whether to include indels in "SNP" parsing [True]
skiploci=LIST : List of loci to exclude from pileup parsing (e.g. mitochondria) []
mincut=X : Minimum read count for minor allele (proportion if <1) [0.05]
absmincut=X : Absolute minimum read count for minor allele (used if mincut<1) [2]
biallelic=T/F : Whether to restrict SNPs to pure biallelic SNPs (two alleles meeting mincut) [False]
ignoren=T/F : Whether to exclude "N" calls from alleles [True]
ignoreref=T/F : If False will always keep Reference allele and call fixed change as SNP [True]

Processing options

forks=X : Number of parallel sequences to process at once [0]
killforks=X : Number of seconds of no activity before killing all remaining forks. [36000]
forksleep=X : Sleep time (seconds) between cycles of forking out more process [0]


History Module Version History

    # 0.0.0 - Initial Compilation.
    # 0.1.0 - Initial working version. Compatible with GABLAM v2.30.0 and Snapper v1.7.0.
    # 0.2.0 - Added endextend=X : Extend minimap2 hits to end of sequence if with X bp [10]
    # 0.3.0 - Added mapsplice mode for dealing with transcript mapping.
    # 0.3.1 - Correct PAF splicing bug.
    # 0.4.0 - Added TmpDir and forking for GABLAM conversion.
    # 0.5.0 - Added uniquehit=T/F : Option to use *.hitunique.tdt table of unique coverage for GABLAM coverage stats [False]
    # 0.6.0 - Added CS alignment manipulation methods.
    # 0.6.1 - Added additional error-handling for CS parsing errors.
    # 0.7.0 - Added alnseq=T/F : Whether to use alnseq-based processing (True) or CS-Gstring processing (dev only) [False]
    # 0.7.1 - Disabled endextend due to bug.
    # 0.7.2 - Fixed alnlen bug - minimap2 ignore Ns for alignment length calculation. Fixed endextend bug.
    # 0.7.3 - Added minlocid and minloclen filtering to PAF parsing to speed up processing. Set default alnseq=F.
    # 0.8.0 - Added developmental variant calling options.
    # 0.9.0 - Added long-read mapping to BAM option.
    # 0.10.0 - Added longreadMinimapPAF() and checkpos=TDTFILE options.
    # 0.10.1 - Added spanid=X: Generate sets of read IDs that span checkpos regions, based on values of field X []
    # 0.10.2 - Fixed formatting for Python 2.6 back compatibility for servers.
    # 0.10.3 - Fixing issues of PAF files not being generated.
    # 0.11.0 - Added HiFi read type.
    # 0.12.0 - Added readnames=T/F : Output the read names to the RID file [False]
    # 0.13.0 - Updated default -x to asm20 (5% divergence). Use minimap2 map-hifi setting for hifi read mapping.

rje_paf REST Output formats

Run with &rest=docs for program documentation and options. A plain text version is accessed with &rest=help.
&rest=OUTFMT can be used to retrieve individual parts of the output, matching the tabs in the default
(&rest=format) output. Individual OUTFMT elements can also be parsed from the full (&rest=full) server output,
which is formatted as follows:
###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~###
# OUTFMT:
... contents for OUTFMT section ...

Available REST Outputs

There is currently no specific help available on REST output for this program.

© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.