Module:	SMRTSCAPE_V1
Description:	SMRT Subread Coverage & Assembly Parameter Estimator
Version:	1.10.1
Last Edit:	26/05/16

Imported modules: rje rje_db rje_obj rje_seqlist rje_tree rje_dismatrix_V3

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

SMRTSCAPE (SMRT Subread Coverage & Assembly Parameter Estimator) is tool in development as part of our PacBio sequencing projects for predicting and/or assessing the quantity and quality of useable data required/produced for HGAP3 de novo whole genome assembly. The current documentation is below. Some tutorials will be developed in the future - in the meantime, please get in touch if you want to use it and anything isn't clear.

The main functions of SMRTSCAPE are:

1. Estimate Genome Coverage and required numbers of SMRT cells given predicted read outputs. NOTE: Default settings for SMRT cell output are not reliable and you should speak to your sequencing provider for their up-to-date figures.

2. Summarise the amount of sequence data obtained from one or more SMRT cells, including unique coverage (one read per ZMW).

3. Calculate predicted coverage from subread data for difference length and quality cutoffs.

4. Predict HGAP3 length and quality settings to achieve a given coverage and accuracy.

SMRTSCAPE coverage=T mode can be run from the EdwardsLab server at: <http://www.slimsuite.unsw.edu.au/servers/pacbio.php>

NOTE: SMRTSCAPE Version 1 has been frozen at V1.10.1 (with the exception of bug fixes) and future development will be of SMRTSCAPE Version 2.x onwards. This is being reworked for FALCON assemblies.

Commandline

General Options

genomesize=X : Genome size (bp) [0]

Genome Coverage Options

coverage=T/F : Whether to generate coverage report [False]
avread=X : Average read length (bp) [20000]
smrtreads=X : Average assemble output of a SMRT cell [50000]
smrtunits=X : Units for smrtreads=X (reads/Gb/Mb) [reads]
errperbase=X : Error-rate per base [0.14]
maxcov=X : Maximum X coverage to calculate [100]
bysmrt=T/F : Whether to output estimated coverage by SMRT cell rather than X coverage [False]
xnlist=LIST : Additional columns giving % sites with coverage >= Xn [1+minanchorx->targetxcov+minanchorx]

SubRead Summary Options

summarise=T/F : Generate subread summary statistics including ZMW summary data [False]
seqin=FILE : Subread sequence file for analysis [None]
batch=FILELIST : Batch input of multiple subread fasta files (wildcards allowed) if seqin=None []
targetcov=X : Target percentage coverage for final genome [99.999]
targeterr=X : Target errors per base for preassembly [1/genome size]
calculate=T/F : Calculate X coverage and target X coverage for given seed, anchor + RQ combinations [False]
minanchorx=X : Minimum X coverage for anchor subreads [6]
minreadlen=X : Absolute minimum read length for calculations (use minlen=X to affect summary also) [500]
rq=X,Y : Minimum (X) and maximum (Y) values for read quality cutoffs [0.8,0.9]
rqstep=X : Size of RQ jumps for calculation (min 0.001) [0.01]

Preassembly Fragmentation analysis Options

preassembly=FILE: Preassembly fasta file to assess/correct over-fragmentation (use seqin=FILE for subreads) [None]

Assembly Parameter Options

parameters=T/F : Whether to output predicted "best" set of parameters [False]
targetxcov=X : Target 100% X Coverage for pre-assembly [3]
xmargin=X : "Safety margin" inflation of X coverage [1]
mapefficiency=X : [Adv.] Efficiency of mapping anchor subreads onto seed reads for correction [1.0]
xsteplen=X : [Adv.] Size (bp) of increasing coverage steps for calculating required depths of coverage [1e5]
parseparam=FILES: Parse parameter settings from 1+ assembly runs []
paramlist=LIST : List of parameters to retain for parseparam output (file or comma separated, blank=all) []
predict=T/F : Whether to add XCoverage prediction and efficiency estimation from parameters and subreads [False]

History Module Version History

    # 0.0.0 - Initial Compilation.
    # 1.0.0 - Initial working version for server.
    # 1.1.0 - Added xnlist=LIST : Additional columns giving % sites with coverage >= Xn [10,25,50,100].
    # 1.2.0 - Added assessment -> now PAGSAT.
    # 1.3.0 - Added seed and anchor read coverage generator (calculate=T).
    # 1.3.1 - Deleted assessment function. (Now handled by PAGSAT.)
    # 1.4.0 - Added new coverage=T function that incorporates seed and anchor subreads.
    # 1.5.0 - Added parseparam=FILES with paramlist=LIST to parse restricted sets of parameters.
    # 1.6.0 - New SMRTSCAPE program building on PacBio v1.5.0. Added predict=T/F option.
    # 1.6.1 - Updated parameters=T to incorporate that the seed read counts as X=1.
    # 1.7.0 - Added *.summary.tdt output from subread summary analysis. Added minreadlen.
    # 1.8.0 - preassembly=FILE: Preassembly fasta file to assess/correct over-fragmentation (use seqin=FILE for subreads)
    # 1.9.0 - Updated empirical preassembly mapefficiency calculation.
    # 1.10.0 - Added batch processing of subread files.
    # 1.10.1 - Fixed bug in batch processing.

SMRTSCAPE_V1 REST Output formats

Run with &rest=help for general options. Run with &rest=full to get full server output as text or &rest=format
for more user-friendly formatted output. Individual outputs can be identified/parsed using &rest=OUTFMT for:

coverage = main results table

SLiMSuite REST Server

SMRTSCAPE_V1 V1.10.1

SMRT Subread Coverage & Assembly Parameter Estimator