SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
Genomes
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

NUMTFinder V0.5.2

Nuclear mitochondrial fragment (NUMT) search tool

Module: NUMTFinder
Description: Nuclear mitochondrial fragment (NUMT) search tool
Version: 0.5.2
Last Edit: 21/03/22
Citation: Edwards RJ et al. (2021), BMC Genomics
GitHub: https://github.com/slimsuite/numtfinder

Copyright © 2021 Richard J. Edwards - See source code for GNU License Notice


Imported modules: rje rje_db rje_obj rje_rmd rje_seqlist rje_samtools gablam


See SLiMSuite Blog for further documentation. See rje for general commands.

Function

NUMTFinder uses a mitochondrial genome to search against genome assembly and identify putative NUMTs. NUMT fragments are then combined into NUMT blocks based on proximity.

The general NUMTFinder workflow is:

1. Generate a double-copy linearised mtDNA sequence from the circular genome. 2. Perform a BLAST+ blastn search of the double-mtDNA versus the genome assembly using GABLAM. 3. Optionally filter short NUMT hits based on length. 4. Optionally filter NUMT hits based on hit sequence name and/or high identity (e.g. identify/remove real mtDNA). 5. Collapse nearby fragments into NUMT blocks. By default, fragments can incorporate duplications and rearrangements, including inversions. Setting stranded=T will restrict blocks to fragments on the same strand. 6. Map fragments back on to the mtDNA genome and output a coverage plot.

Plans for future releases include:

  • incorporation of additional search methods (LASTZ or kmers)
  • assembly masking options
  • options to restrict NUMT blocks to fully collinear hits.
  • automated running of Diploidocus long-read regcheck on fragments and blocks

Commandline

Main NUMTFinder run options

seqin=FILE : Genome assembly in which to search for NUMTs []
mtdna=FILE : mtDNA reference genome to use for search []
basefile=X : Prefix for output files [numtfinder]
summarise=T/F : Whether to summarise input sequence files upon loading [True]
dochtml=T/F : Generate HTML NUMTFinder documentation (*.docs.html) instead of main run [False]

NUMTFinder search options

circle=T/F : Whether the mtDNA is circular [True]
blaste=X : BLAST+ blastn evalue cutoff for NUMT search [1e-4]
minfraglen=INT : Minimum local (NUMT fragment) alignment length (sets GABLAM localmin=X) [0]
exclude=LIST : Exclude listed sequence names from search [mtDNA sequence name]
mtmaxcov=PERC : Maximum percentage coverage of mtDNA (at mtmaxid identity) to allow [99]
mtmaxid=PERC : Maximum percentage identity of mtDNA hits > mtmaxcov coverage to allow [99]
mtmaxexclude=T/F: Whether add sequences breaching mtmax filters to the exclude=LIST exclusion list [True]
keepblast=T/F : Whether to keep the blast results files rather than delete them [True]
forks=INT : Use multiple threads for the NUMT search [0]

NUMTFinder block options

fragmerge=X : Max Length of gaps between fragmented local hits to merge [8000]
stranded=T/F : Whether to only merge fragments on the same strand [False]

NUMTFinder output options

localgff=T/F : Whether to output GFF format files of the NUMT hits against the genome [True]
localsam=T/F : Whether to output SAM format files of the NUMT hits against the genome [True]
fasdir=PATH : Directory in which to save fasta files [numtfasta/]
fragfas=T/F : Whether to output NUMT fragment to fasta file [True]
fragrevcomp=T/F : Whether to reverse-complement DNA fragments that are on reverse strand to query [True]
blockfas=T/F : Whether to generate a combined fasta file of NUMT block regions (positive strand) [True]
nocovfas=T/F : Whether to output the regions of mtDNA with no coverage & peak coverage [False]
depthplot=T/F : Whether to output mtDNA depth plots of sequence coverage (requires R) [True]
depthsmooth=X : Smooth out any read plateaus < X nucleotides in length [0]
peaksmooth=X : Smooth out Xcoverage peaks < X depth difference to flanks (<1 = %Median) [0]


History Module Version History

    # 0.0.0 - Initial Compilation.
    # 0.1.0 - Added dochtml=T and modified docstring for standalone git repo.
    # 0.1.1 - Fixed bug with default fragmerge=INT. Now set to 8kb.
    # 0.2.0 - Added SAM output and depth profile of coverage across mitochondrion.
    # 0.3.0 - Added additional exclusion, flagging and filtering of possible mtDNA.
    # 0.4.0 - Added output of zero-coverage mtDNA regions, block fasta, and coverage summary.
    # 0.4.1 - Fixed bug when no NUMTs. Added a bit more documentation of output.
    # 0.4.2 - Fixed coverage output bugs for -ve strand hits over circularisation spot. Improved pickup of partial run.
    # 0.5.0 - Modified depth plot defaults to remove the smoothing.
    # 0.5.1 - Fixed bug with peak fasta output.
    # 0.5.2 - Fixed bug with circle=F mtDNA.

© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.