Genome-wide SNP Mapper
Copyright © 2016 Richard J. Edwards - See source code for GNU License Notice
See SLiMSuite Blog for further documentation. See
Snapper is designed to generate a table of SNPs from a BLAST comparison of two genomes, map those SNPs onto genome features, predict effects and generate a series of output tables to aid exploration of genomic differences.
A basic overview of the Snapper workflow is as follows:
1. Read/parse input sequences and reference features.
2. All-by-all BLAST of query "Alt" genome against reference using GABLAM.
3. Reduction of BLAST hits to Unique BLAST hits in which each region of a genome is mapped onto only a single region of the other genome. This is not bidirectional at this stage, so multiple unique regions of one genome may map onto the same region of the other.
4. Determine Copy Number Variation (CNV) for each region of the genome based on the unique BLAST hits. This is determined at the nucleotide level as the number of times that nucleotide maps to unique regions in the other genome, thus establishing the copy number of that nucleotide in the other genome.
5. Generate SNP Tables based on the unique local BLAST hits. Each mismatch or indel in a local BLAST alignment is recorded as a SNP.
6. Mapping of SNPs onto reference features based on SNP reference locus and position.
7. SNP Type Classification based on the type of SNP (insertion/deletion/substitution) and the feature in which it falls. CDS SNPs are further classified according to codon changes.
8. SNP Effect Classification for CDS features predicting their effects (in isolation) on the protein product.
9. SNP Summary Tables for the whole genome vs genome comparison. This includes a table of CDS Ratings based on the
numbers and types of SNPs. For the
Version 1.1.0 introduced additional fasta output of the genome regions with zero coverage in the other genome, i.e.
the regions in the *.cnv.tdt file with
Version 1.6.0 added
Version 1.7.0 add the option to use minimap2 instead of BLAST+ for speed, using
Reference Feature Options
SNP Mapping Options
History Module Version History
# 0.0.0 - Initial Compilation. # 0.1.0 - Tidied up with improved run pickup. # 0.2.0 - Added FASTQ and improved CNV output along with all features. # 0.2.1 - Fixed local output error. (Query/Qry issue - need to fix this and make consistent!) Fixed snp local table revcomp bug. # 0.2.2 - Corrected excess CNV table output (accnum AND shortname). # 0.2.3 - Corrected "intron" classification for first position of features. Updated FTBest defaults. # 1.0.0 - Working version with completed draft manual. Added to SeqSuite. # 1.0.1 - Fixed issues when features missing. # 1.1.0 - NoCopy fasta output # 1.2.0 - makesnp=T/F : Whether or not to generate Query vs Reference SNP tables [True] # 1.3.0 - localsAM=T/F : Save local (and unique) hits data as SAM files in addition to TDT [False] - via GABLAM # 1.4.0 - localidmin=PERC : Minimum local %identity of local alignment to output to local stats table [0.0] # 1.4.1 - Modified warning for AccNum/Locus mismatch in Reference. # 1.5.0 - Added pNS and modified the "Positive" CDS rating to be pNS < 0.05. # 1.6.0 - filterself=T/F : Filter out self-hits prior to Snapper pipeline (e.g for assembly all-by-all) [False] # 1.6.0 - Added renaming of alt sequences that are found in the Reference for self-comparisons. # 1.6.1 - Fixed bug for reducing to unique-unique pairings that was over-filtering. # 1.7.0 - Added mapper=minimap setting, compatible with GABLAM v2.30.0 and rje_paf v0.1.0. # 1.8.0 - Added dochtml=T and modified docstring for standalone git repo. # 1.8.1 - Bug fixing SNPMap mode.
Snapper REST Output formatsRun with
for more user-friendly formatted output. Individual outputs can be identified/parsed using
© 2015 RJ Edwards. Contact: firstname.lastname@example.org.