SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
Genomes
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

DepthCharge V0.2.0

Genome assembly quality control and misassembly repair

Module: DepthCharge
Description: Genome assembly quality control and misassembly repair
Version: 0.2.0
Last Edit: 20/01/21

Copyright © 2021 Richard J. Edwards - See source code for GNU License Notice


Imported modules: rje rje_forker rje_obj rje_db rje_rmd rje_paf rje_seqlist


See SLiMSuite Blog for further documentation. See rje for general commands.

Function

DepthCharge is an assembly quality control and misassembly repair program. It uses mapped long read depth of coverage to charge through a genome assembly and identify coverage "cliffs" that may indicate a misassembly. If appropriate, it will then blast the assembly into fragment at those misassemblies.

DepthCharge uses a genome assembly and PAF file of mapped reads as input. If no file is provided, minimap2 will be used to generate one.

For each sequence, DepthCharge starts at the beginning of the sequence and scans through the PAF file for coverage to drop below the mindepth=INT threshold (default = 1 read). These positions are marked as "bad" and compressed into regions of adjacent bad positions. Regions at the start or end of a sequnece are labelled "end". Regions overlapping gaps are labelled "gap". Otherwise, regions are labelled "bad". All regions are output to *.depthcharge.tdt along with the length of each sequence (region type "all").

Future versions will either fragment the assembly at "bad" regions (and "gap" regions if 'breakgaps=T. If breakmode=gap then DepthCharge will replace bad regions with a gap (NNNN...) of length gapsize=INT. If breakmode=report then no additional processing of the assembly will be performed. Otherwise, the processed assembly will be saved as *.depthcharge.fasta`.

Commandline

Main DepthCharge run options

seqin=FILE : Input sequence assembly [None]
basefile=FILE : Root of output file names [$SEQIN basefile]
paf=FILE : PAF file of long reads mapped onto assembly [$BASEFILE.paf]
breakmode=X : How to treat misassemblies (report/gap/fragment) [fragment]
breakgaps=T/F : Whether to break at gaps where coverage drops if breakmode=fragment [False]
gapsize=INT : Size of gaps to insert when breakmode=gap [100]
mindepth=INT : Minimum depth to class as OK [1]

PAF file generation options

reads=FILELIST : List of fasta/fastq files containing reads. Wildcard allowed. Can be gzipped. []
readtype=LIST : List of ont/pb/hifi file types matching reads for minimap2 mapping [ont]
minimap2=PROG : Full path to run minimap2 [minimap2]
mapopt=CDICT : Dictionary of minimap2 options [N:100,p:0.0001,x:asm5]

Additional options

dochtml=T/F : Generate HTML Diploidocus documentation (*.docs.html) instead of main run [False]
logfork=T/F : Whether to log forking in main log [False]
tmpdir=PATH : Path for temporary output files during forking (not all modes) [./tmpdir/]


History Module Version History

    # 0.0.0 - Initial Compilation.
    # 0.1.0 - Removed endbuffer and gapbuffer in favour of straight overlap assignment.
    # 0.2.0 - Added HiFi read type.

© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.