SLiMSuite REST Server

EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
REST Pages
REST Status
REST Tools
REST Alias Data
REST Sitemap

rje_readcore V0.8.0

Read mapping and analysis core module

Module: rje_readcore
Description: Read mapping and analysis core module
Version: 0.8.0
Last Edit: 29/01/22

Copyright © 2021 Richard J. Edwards - See source code for GNU License Notice

Imported modules: rje rje_db rje_forker rje_obj rje_seqlist

See SLiMSuite Blog for further documentation. See rje for general commands.


This module has very simple standalone functionality to check for the existence of a BAM file and generate it from one or more long read input files if not found. It is primarily for inheritance by other RJE tools that need to make use of long read mapping and/or simple depth/coverage stat wrapping. The core ReadCore object will have methods for populating and checking key input files and settings for a number of other SeqSuite tools.

## Dependencies

For read mapping, [minimap2]( must be installed and either added to the environment $PATH or given with the minimap2=PROG setting. For depth summaries, [samtools]( needs to be installed. Presence of dependencies will be checked when needed and an error raised if not found.


Core input options

seqin=FILE : Input sequence assembly [None]
basefile=FILE : Root of output file names [$SEQIN basefile]
paf=FILE : PAF file of long reads mapped onto assembly [$BASEFILE.paf]
bam=FILE : BAM file of long reads mapped onto assembly [$BASEFILE.bam]
bamcsi=T/F : Use CSI indexing for BAM files, not BAI (needed for v long scaffolds) [False]
reads=FILELIST : List of fasta/fastq files containing reads. Wildcard allowed. Can be gzipped. []
readtype=LIST : List of ont/pb/hifi file types matching reads for minimap2 mapping [ont]

Depth and Copy Number options

scdepth=NUM : Single copy ("diploid") read depth. If zero, will use SC BUSCO mode [0]
busco=TSVFILE : BUSCO full table [full_table_$BASEFILE.busco.tsv]
quickdepth=T/F : Whether to use samtools depth in place of mpileup (quicker but underestimates?) [False]
depfile=FILE : Precomputed depth file (*.fastdep or *.fastmp) to use [None]
regfile=FILE : File of SeqName, Start, End positions (or GFF) for read coverage checking [None]
checkfields=LIST: Fields in checkpos file to give Locus, Start and End for checking [SeqName,Start,End]
gfftype=LIST : Optional feature types to use if performing regcheck on GFF file (e.g. gene) ['gene']
depadjust=INT : Advanced R density bandwidth adjustment parameter [12]
seqstats=T/F : Whether to output CN and depth data for full sequences as well as BUSCO genes [False]
cnmax=INT : Max. y-axis value for CN plot (and mode multiplier for related depth plots) [4]

System options

forks=X : Number of parallel sequences to process at once [0]
killforks=X : Number of seconds of no activity before killing all remaining forks. [36000]
forksleep=X : Sleep time (seconds) between cycles of forking out more process [0]
tmpdir=PATH : Path for temporary output files during forking [./tmpdir/]
minimap2=PROG : Full path to run minimap2 [minimap2]
rscript=PROG : Full path to run minimap2 [Rscript]
samtools=PROG : Full path to run minimap2 [samtools]

History Module Version History

    # 0.0.0 - Initial Compilation.
    # 0.1.0 - Adding forking for fastdepth file generation.
    # 0.2.0 - Added CovBases lower depthsizer estimate output based solely on mapped reads.
    # 0.2.1 - Added unique sorting of CIGAR strings for indel ratio. Fixed end padding of zero-coverage depths.
    # 0.2.2 - Fixed major flaw in indelratio calculation.
    # 0.3.0 - Add benchmark=T/F option to the genome size prediction. Tidied CovBase and MapAdjust.
    # 0.3.1 - Tweaked some input checks and log output. Replaced indelratio sort -u with uniq for speed and memory.
    # 0.4.0 - Added seqstats=T/F : Whether to output CN and depth data for full sequences as well as BUSCO genes [False]
    # 0.4.1 - Fixed bug that causes clashes with v5 full_table.bed files.
    # 0.5.0 - Add additional map adjustment variants:
    #       - MapAdjust2 = allbases, not covbases
    #       - MapBases = Use map bases, not covbases for min read volumne
    #       - MapRatio = Use mapbases adjusted by indelratio
    # 0.6.0 - Added support for multiple regfiles and setting max limit for CN graphics.
    # 0.7.0 - Added passing on of gfftype=LIST option to Rscript.
    # 0.7.1 - Fixed readtype recycle bug.
    # 0.8.0 - Added bamcsi=T/F : Use CSI indexing for BAM files, not BAI (needed for v long scaffolds) [False]

© 2015 RJ Edwards. Contact: