SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
Genomes
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

rje_pacbio V1.6.0

Miscellaneous Utilities for PacBio Sequencing

Module: rje_pacbio
Description: Miscellaneous Utilities for PacBio Sequencing
Version: 1.6.0
Last Edit: 05/11/15
Webserver: http://www.slimsuite.unsw.edu.au/servers/pacbio.php

Copyright © 2015 Richard J. Edwards - See source code for GNU License Notice


Imported modules: rje rje_db rje_obj rje_seqlist rje_tree rje_dismatrix_V3 gablam


See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This module estimates the % genome coverage and accuracy for different X coverage of a genome using PacBio sequencing, i.e. assuming a non-biased error distribution. Calculations use binomial/poisson distributions, assuming independence of sites. Accuracy is based on >50% reads covering a particular base having the correct call. Assuming random calls at the other positions, 25% of the "wrong" positions will be correct by chance. In reality, it will be even higher than this, assuming majority calls are used. Wrong calls will be split between three possible incorrect bases. Accuracy is therefore a conservative estimate.

All calculations are based on *assembled* reads, and therefore using the full smrtreads=X value for SMRT cells will overestimate coverage. Note that smrtreads=X can be used to input sequence capacity in Gb (or Mb) rather than read counts by changing smrtunits=X.

NOTE: This module has been superseded by SMRTSCAPE.

Output

Main output is a results table containing the following fields:

  • XCoverage = estimated average X genome coverage.
  • SMRT = estimated number of SMRT cells.
  • %Coverage = estimated percentage genome coverage.
  • %Accuracy = estimated percentage of covered bases with correct base calls.
  • %Xn = 0+ columns giving % sites with coverage >= Xn (xnlist=LIST).

Commandline

General Options

genomesize=X : Genome size (bp) [0]

Genome Coverage Options

coverage=T/F : Whether to generate coverage report [True]
avread=X : Average read length (bp) [20000]
smrtreads=X : Average assemble output of a SMRT cell [50000]
smrtunits=X : Units for smrtreads=X (reads/Gb/Mb) [reads]
errperbase=X : Error-rate per base [0.14]
maxcov=X : Maximmum X coverage to calculate [100]
bysmrt=T/F : Whether to output estimated coverage by SMRT cell rather than X coverage [False]
xnlist=LIST : Additional columns giving % sites with coverage >= Xn [1+minanchorx->targetxcov+minanchorx]

SubRead Summary Options

summarise=T/F : Generate subread summary statistics including ZMW summary data [False]
seqin=FILE : Subread sequence file for analysis [None]
targetcov=X : Target percentage coverage for final genome [99.999]
targeterr=X : Target errors per base for preassembly [1/genome size]
calculate=T/F : Calculate X coverage and target X coverage for given seed, anchor + RQ combinations [False]
minanchorx=X : Minimum X coverage for anchor subreads [6]
rq=X,Y : Minimum (X) and maximum (Y) values for read quality cutoffs [0.8,0.9]
rqstep=X : Size of RQ jumps for calculation (min 0.001) [0.01]
rqmean=T/F : Whether to use mean RQ instead of min RQ for calculations [False]

Assembly Parameter Options

parameters=T/F : Whether to output predicted "best" set of parameters [False]
targetxcov=X : Target 100% X Coverage for pre-assembly [3]
xmargin=X : "Safety margin" inflation of X coverage [1]
mapefficiency=X : [Adv.] Efficiency of mapping anchor subreads onto seed reads for correction [1.0]
xsteplen=X : [Adv.] Size (bp) of increasing coverage steps for calculating required depths of coverage [1e6]
parseparam=FILES: Parse parameter settings from 1+ assembly runs []
paramlist=LIST : List of parameters to retain for parseparam output (file or comma separated, blank=all) []
seqstats=T/F : Add assembly sequence stats (if *.fas and *.preassemly.fasta files found) to parseparam run [True]


History Module Version History

    # 0.0.0 - Initial Compilation.
    # 1.0.0 - Initial working version for server.
    # 1.1.0 - Added xnlist=LIST : Additional columns giving % sites with coverage >= Xn [10,25,50,100].
    # 1.2.0 - Added assessment -> now PAGSAT.
    # 1.3.0 - Added seed and anchor read coverage generator (calculate=T).
    # 1.3.1 - Deleted assessment function. (Now handled by PAGSAT.)
    # 1.4.0 - Added new coverage=T function that incorporates seed and anchor subreads.
    # 1.5.0 - Added parseparam=FILES with paramlist=LIST to parse restricted sets of parameters.
    # 1.6.0 - Added seqstats=T/F function to add assembly sequence stats (if files found) to parseparam run.

© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.