|
|
Module: | DepthCharge |
Description: | Genome assembly quality control and misassembly repair |
Version: | 0.2.0 |
Last Edit: | 20/01/21 |
|
Copyright © 2021 Richard J. Edwards - See source code for GNU License Notice
Imported modules:
rje
rje_forker
rje_obj
rje_db
rje_rmd
rje_paf
rje_seqlist
See SLiMSuite Blog for further documentation. See rje
for general commands.
Function
DepthCharge is an assembly quality control and misassembly repair program. It uses mapped long read depth of
coverage to charge through a genome assembly and identify coverage "cliffs" that may indicate a misassembly.
If appropriate, it will then blast the assembly into fragment at those misassemblies.
DepthCharge uses a genome assembly and PAF file of mapped reads as input. If no file is provided, minimap2 will
be used to generate one.
For each sequence, DepthCharge starts at the beginning of the sequence and scans through the PAF file for
coverage to drop below the mindepth=INT
threshold (default = 1 read). These positions are marked as "bad" and
compressed into regions of adjacent bad positions. Regions at the start or end of a sequnece are labelled "end".
Regions overlapping gaps are labelled "gap". Otherwise, regions are labelled "bad". All regions are output to
*.depthcharge.tdt
along with the length of each sequence (region type "all").
Future versions will either fragment the assembly at "bad" regions (and "gap" regions if 'breakgaps=T. If
breakmode=gap
then DepthCharge will replace bad regions with a gap (
NNNN...) of length gapsize=INT
. If
breakmode=report
then no additional processing of the assembly will be performed. Otherwise, the processed
assembly will be saved as
*.depthcharge.fasta`.
Commandline
Main DepthCharge run options
seqin=FILE
: Input sequence assembly [None
]
basefile=FILE
: Root of output file names [$SEQIN basefile
]
paf=FILE
: PAF file of long reads mapped onto assembly [$BASEFILE.paf
]
breakmode=X
: How to treat misassemblies (report/gap/fragment) [fragment
]
breakgaps=T/F
: Whether to break at gaps where coverage drops if breakmode=fragment
[False
]
gapsize=INT
: Size of gaps to insert when breakmode=gap
[100
]
mindepth=INT
: Minimum depth to class as OK [1
]
PAF file generation options
reads=FILELIST
: List of fasta/fastq files containing reads. Wildcard allowed. Can be gzipped. []
readtype=LIST
: List of ont/pb/hifi file types matching reads for minimap2 mapping [ont
]
minimap2=PROG
: Full path to run minimap2 [minimap2
]
mapopt=CDICT
: Dictionary of minimap2 options [N:100,p:0.0001,x:asm5
]
Additional options
dochtml=T/F
: Generate HTML Diploidocus documentation (*.docs.html) instead of main run [False
]
logfork=T/F
: Whether to log forking in main log [False
]
tmpdir=PATH
: Path for temporary output files during forking (not all modes) [./tmpdir/
]
History Module Version History
# 0.0.0 - Initial Compilation.
# 0.1.0 - Removed endbuffer and gapbuffer in favour of straight overlap assignment.
# 0.2.0 - Added HiFi read type.