|Description:|| Read mapping and analysis core module|
|Last Edit:|| 29/01/22|
Copyright © 2021 Richard J. Edwards - See source code for GNU License Notice
See SLiMSuite Blog for further documentation. See
rje for general commands.
This module has very simple standalone functionality to check for the existence of a BAM file and generate it from
one or more long read input files if not found. It is primarily for inheritance by other RJE tools that need to
make use of long read mapping and/or simple depth/coverage stat wrapping. The core ReadCore object will have
methods for populating and checking key input files and settings for a number of other SeqSuite tools.
For read mapping, [minimap2](https://github.com/lh3/minimap2) must be installed and either added to the environment
$PATH or given with the
minimap2=PROG setting. For depth summaries, [samtools](http://www.htslib.org/) needs to
be installed. Presence of dependencies will be checked when needed and an error raised if not found.
Core input options
seqin=FILE : Input sequence assembly [
basefile=FILE : Root of output file names [
paf=FILE : PAF file of long reads mapped onto assembly [
bam=FILE : BAM file of long reads mapped onto assembly [
bamcsi=T/F : Use CSI indexing for BAM files, not BAI (needed for v long scaffolds) [
reads=FILELIST : List of fasta/fastq files containing reads. Wildcard allowed. Can be gzipped. 
readtype=LIST : List of ont/pb/hifi file types matching reads for minimap2 mapping [
Depth and Copy Number options
scdepth=NUM : Single copy ("diploid") read depth. If zero, will use SC BUSCO mode [
busco=TSVFILE : BUSCO full table [
quickdepth=T/F : Whether to use samtools depth in place of mpileup (quicker but underestimates?) [
depfile=FILE : Precomputed depth file (*.fastdep or *.fastmp) to use [
regfile=FILE : File of SeqName, Start, End positions (or GFF) for read coverage checking [
checkfields=LIST: Fields in checkpos file to give Locus, Start and End for checking [
gfftype=LIST : Optional feature types to use if performing regcheck on GFF file (e.g. gene) [
depadjust=INT : Advanced R density bandwidth adjustment parameter [
seqstats=T/F : Whether to output CN and depth data for full sequences as well as BUSCO genes [
cnmax=INT : Max. y-axis value for CN plot (and mode multiplier for related depth plots) [
forks=X : Number of parallel sequences to process at once [
killforks=X : Number of seconds of no activity before killing all remaining forks. [
forksleep=X : Sleep time (seconds) between cycles of forking out more process [
tmpdir=PATH : Path for temporary output files during forking [
minimap2=PROG : Full path to run minimap2 [
rscript=PROG : Full path to run minimap2 [
samtools=PROG : Full path to run minimap2 [
History Module Version History
# 0.0.0 - Initial Compilation.
# 0.1.0 - Adding forking for fastdepth file generation.
# 0.2.0 - Added CovBases lower depthsizer estimate output based solely on mapped reads.
# 0.2.1 - Added unique sorting of CIGAR strings for indel ratio. Fixed end padding of zero-coverage depths.
# 0.2.2 - Fixed major flaw in indelratio calculation.
# 0.3.0 - Add benchmark=T/F option to the genome size prediction. Tidied CovBase and MapAdjust.
# 0.3.1 - Tweaked some input checks and log output. Replaced indelratio sort -u with uniq for speed and memory.
# 0.4.0 - Added seqstats=T/F : Whether to output CN and depth data for full sequences as well as BUSCO genes [False]
# 0.4.1 - Fixed bug that causes clashes with v5 full_table.bed files.
# 0.5.0 - Add additional map adjustment variants:
# - MapAdjust2 = allbases, not covbases
# - MapBases = Use map bases, not covbases for min read volumne
# - MapRatio = Use mapbases adjusted by indelratio
# 0.6.0 - Added support for multiple regfiles and setting max limit for CN graphics.
# 0.7.0 - Added passing on of gfftype=LIST option to Rscript.
# 0.7.1 - Fixed readtype recycle bug.
# 0.8.0 - Added bamcsi=T/F : Use CSI indexing for BAM files, not BAI (needed for v long scaffolds) [False]