NUMTFinder uses a mitochondrial genome to search against genome assembly and identify putative NUMTs. NUMT fragments
are then combined into NUMT blocks based on proximity.
The general NUMTFinder workflow is:
1. Generate a double-copy linearised mtDNA sequence from the circular genome.
2. Perform a BLAST+ blastn search of the double-mtDNA versus the genome assembly using GABLAM.
3. Optionally filter short NUMT hits based on length.
4. Optionally filter NUMT hits based on hit sequence name and/or high identity (e.g. identify/remove real mtDNA).
5. Collapse nearby fragments into NUMT blocks. By default, fragments can incorporate duplications and rearrangements,
including inversions. Setting
stranded=T will restrict blocks to fragments on the same strand.
6. Map fragments back on to the mtDNA genome and output a coverage plot.
Plans for future releases include:
- incorporation of additional search methods (LASTZ or kmers)
- assembly masking options
- options to restrict NUMT blocks to fully collinear hits.
- automated running of Diploidocus long-read regcheck on fragments and blocks
Main NUMTFinder run options
seqin=FILE : Genome assembly in which to search for NUMTs 
mtdna=FILE : mtDNA reference genome to use for search 
basefile=X : Prefix for output files [
summarise=T/F : Whether to summarise input sequence files upon loading [
dochtml=T/F : Generate HTML NUMTFinder documentation (*.docs.html) instead of main run [
NUMTFinder search options
circle=T/F : Whether the mtDNA is circular [
blaste=X : BLAST+ blastn evalue cutoff for NUMT search [
minfraglen=INT : Minimum local (NUMT fragment) alignment length (sets GABLAM
exclude=LIST : Exclude listed sequence names from search [
mtDNA sequence name]
mtmaxcov=PERC : Maximum percentage coverage of mtDNA (at mtmaxid identity) to allow [
mtmaxid=PERC : Maximum percentage identity of mtDNA hits > mtmaxcov coverage to allow [
mtmaxexclude=T/F: Whether add sequences breaching mtmax filters to the
exclude=LIST exclusion list [
keepblast=T/F : Whether to keep the blast results files rather than delete them [
forks=INT : Use multiple threads for the NUMT search [
NUMTFinder block options
fragmerge=X : Max Length of gaps between fragmented local hits to merge [
stranded=T/F : Whether to only merge fragments on the same strand [
NUMTFinder output options
localgff=T/F : Whether to output GFF format files of the NUMT hits against the genome [
localsam=T/F : Whether to output SAM format files of the NUMT hits against the genome [
fasdir=PATH : Directory in which to save fasta files [
fragfas=T/F : Whether to output NUMT fragment to fasta file [
fragrevcomp=T/F : Whether to reverse-complement DNA fragments that are on reverse strand to query [
blockfas=T/F : Whether to generate a combined fasta file of NUMT block regions (positive strand) [
nocovfas=T/F : Whether to output the regions of mtDNA with no coverage & peak coverage [
depthplot=T/F : Whether to output mtDNA depth plots of sequence coverage (requires R) [
depthsmooth=X : Smooth out any read plateaus < X nucleotides in length [
peaksmooth=X : Smooth out Xcoverage peaks < X depth difference to flanks (<1 = %Median) [
History Module Version History
# 0.0.0 - Initial Compilation.
# 0.1.0 - Added dochtml=T and modified docstring for standalone git repo.
# 0.1.1 - Fixed bug with default fragmerge=INT. Now set to 8kb.
# 0.2.0 - Added SAM output and depth profile of coverage across mitochondrion.
# 0.3.0 - Added additional exclusion, flagging and filtering of possible mtDNA.
# 0.4.0 - Added output of zero-coverage mtDNA regions, block fasta, and coverage summary.
# 0.4.1 - Fixed bug when no NUMTs. Added a bit more documentation of output.
# 0.4.2 - Fixed coverage output bugs for -ve strand hits over circularisation spot. Improved pickup of partial run.
# 0.5.0 - Modified depth plot defaults to remove the smoothing.
# 0.5.1 - Fixed bug with peak fasta output.
# 0.5.2 - Fixed bug with circle=F mtDNA.