Function
NUMTFinder uses a mitochondrial genome to search against genome assembly and identify putative NUMTs. NUMT fragments
are then combined into NUMT blocks based on proximity.
The general NUMTFinder workflow is:
1. Generate a double-copy linearised mtDNA sequence from the circular genome.
2. Perform a BLAST+ blastn search of the double-mtDNA versus the genome assembly using GABLAM.
3. Optionally filter short NUMT hits based on length.
4. Optionally filter NUMT hits based on hit sequence name and/or high identity (e.g. identify/remove real mtDNA).
5. Collapse nearby fragments into NUMT blocks. By default, fragments can incorporate duplications and rearrangements,
including inversions. Setting stranded=T
will restrict blocks to fragments on the same strand.
6. Map fragments back on to the mtDNA genome and output a coverage plot.
Plans for future releases include:
- incorporation of additional search methods (LASTZ or kmers)
- assembly masking options
- options to restrict NUMT blocks to fully collinear hits.
- automated running of Diploidocus long-read regcheck on fragments and blocks
Commandline
Main NUMTFinder run options
seqin=FILE
: Genome assembly in which to search for NUMTs []
mtdna=FILE
: mtDNA reference genome to use for search []
basefile=X
: Prefix for output files [numtfinder
]
summarise=T/F
: Whether to summarise input sequence files upon loading [True
]
dochtml=T/F
: Generate HTML NUMTFinder documentation (*.docs.html) instead of main run [False
]
NUMTFinder search options
circle=T/F
: Whether the mtDNA is circular [True
]
blaste=X
: BLAST+ blastn evalue cutoff for NUMT search [1e-4
]
minfraglen=INT
: Minimum local (NUMT fragment) alignment length (sets GABLAM localmin=X
) [0
]
exclude=LIST
: Exclude listed sequence names from search [mtDNA sequence name
]
mtmaxcov=PERC
: Maximum percentage coverage of mtDNA (at mtmaxid identity) to allow [99
]
mtmaxid=PERC
: Maximum percentage identity of mtDNA hits > mtmaxcov coverage to allow [99
]
mtmaxexclude=T/F
: Whether add sequences breaching mtmax filters to the exclude=LIST
exclusion list [True
]
keepblast=T/F
: Whether to keep the blast results files rather than delete them [True
]
forks=INT
: Use multiple threads for the NUMT search [0
]
NUMTFinder block options
fragmerge=X
: Max Length of gaps between fragmented local hits to merge [8000
]
stranded=T/F
: Whether to only merge fragments on the same strand [False
]
NUMTFinder output options
localgff=T/F
: Whether to output GFF format files of the NUMT hits against the genome [True
]
localsam=T/F
: Whether to output SAM format files of the NUMT hits against the genome [True
]
fasdir=PATH
: Directory in which to save fasta files [numtfasta/
]
fragfas=T/F
: Whether to output NUMT fragment to fasta file [True
]
fragrevcomp=T/F
: Whether to reverse-complement DNA fragments that are on reverse strand to query [True
]
blockfas=T/F
: Whether to generate a combined fasta file of NUMT block regions (positive strand) [True
]
nocovfas=T/F
: Whether to output the regions of mtDNA with no coverage & peak coverage [False
]
depthplot=T/F
: Whether to output mtDNA depth plots of sequence coverage (requires R) [True
]
depthsmooth=X
: Smooth out any read plateaus < X nucleotides in length [0
]
peaksmooth=X
: Smooth out Xcoverage peaks < X depth difference to flanks (<1 = %Median) [0
]
History Module Version History
# 0.0.0 - Initial Compilation.
# 0.1.0 - Added dochtml=T and modified docstring for standalone git repo.
# 0.1.1 - Fixed bug with default fragmerge=INT. Now set to 8kb.
# 0.2.0 - Added SAM output and depth profile of coverage across mitochondrion.
# 0.3.0 - Added additional exclusion, flagging and filtering of possible mtDNA.
# 0.4.0 - Added output of zero-coverage mtDNA regions, block fasta, and coverage summary.
# 0.4.1 - Fixed bug when no NUMTs. Added a bit more documentation of output.
# 0.4.2 - Fixed coverage output bugs for -ve strand hits over circularisation spot. Improved pickup of partial run.
# 0.5.0 - Modified depth plot defaults to remove the smoothing.
# 0.5.1 - Fixed bug with peak fasta output.
# 0.5.2 - Fixed bug with circle=F mtDNA.