Module:	Snapper
Description:	Genome-wide SNP Mapper
Version:	1.8.1
Last Edit:	12/03/21

Imported modules: rje rje_db rje_obj rje_genbank rje_rmd rje_seqlist rje_sequence snp_mapper rje_blast_V2 gablam

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

Snapper is designed to generate a table of SNPs from a BLAST comparison of two genomes, map those SNPs onto genome features, predict effects and generate a series of output tables to aid exploration of genomic differences.

A basic overview of the Snapper workflow is as follows:

1. Read/parse input sequences and reference features.

2. All-by-all BLAST of query "Alt" genome against reference using GABLAM.

3. Reduction of BLAST hits to Unique BLAST hits in which each region of a genome is mapped onto only a single region of the other genome. This is not bidirectional at this stage, so multiple unique regions of one genome may map onto the same region of the other.

4. Determine Copy Number Variation (CNV) for each region of the genome based on the unique BLAST hits. This is determined at the nucleotide level as the number of times that nucleotide maps to unique regions in the other genome, thus establishing the copy number of that nucleotide in the other genome.

5. Generate SNP Tables based on the unique local BLAST hits. Each mismatch or indel in a local BLAST alignment is recorded as a SNP.

6. Mapping of SNPs onto reference features based on SNP reference locus and position.

7. SNP Type Classification based on the type of SNP (insertion/deletion/substitution) and the feature in which it falls. CDS SNPs are further classified according to codon changes.

8. SNP Effect Classification for CDS features predicting their effects (in isolation) on the protein product.

9. SNP Summary Tables for the whole genome vs genome comparison. This includes a table of CDS Ratings based on the numbers and types of SNPs. For the *.summary.tdt output is, each SNP is only mapped to a single feature according to the FTBest hierarchy, removing SNPs mapping to one feature type from feature types lower in the list: - CDS,mRNA,tRNA,rRNA,ncRNA,misc_RNA,gene,mobile_element,LTR,rep_origin,telomere,centromere,misc_feature,intergenic

Version 1.1.0 introduced additional fasta output of the genome regions with zero coverage in the other genome, i.e. the regions in the *.cnv.tdt file with CNV=0. Regions smaller than nocopylen=X [default=100] are deleted and then those within nocopymerge=X [default=20] of each other will be merged for output. This can be switched off with nocopyfas=F.

Version 1.6.0 added filterself=T/F to filter out self-hits prior to Snapper pipeline. seqin=FILE sequences that are found in the Reference (matched by name) will be renamed with the prefix alt and output to *.alt.fasta. This is designed for identifying unique and best-matching homologous contigs from whole genome assemblies, where seqin=FILE and reference=FILE are the same. In this case, it is recommended to increase the localmin=X cutoff.

Version 1.7.0 add the option to use minimap2 instead of BLAST+ for speed, using mapper=minimap.

Commandline

Input/Output options

seqin=FASFILE : Input genome to identify variants in []
reference=FILE : Fasta (with accession numbers matching Locus IDs) or genbank file of reference genome. []
basefile=FILE : Root of output file names (same as SNP input file by default) [<SNPFILE> or <SEQIN>.vs.<REFERENCE>]
nocopyfas=T/F : Whether to output CNV=0 fragments to *.nocopy.fas fasta file [True]
nocopylen=X : Minimum length for CNV=0 fragments to be output [100]
nocopymerge=X : CNV=0 fragments within X nt of each other will be merged prior to output [20]
makesnp=T/F : Whether or not to generate Query vs Reference SNP tables [True]
localsAM=T/F : Save local (and unique) hits data as SAM files in addition to TDT [False]
filterself=T/F : Filter out self-hits prior to Snapper pipeline (e.g for assembly all-by-all) [False]
mapper=X : Program to use for mapping files against each other (blast/minimap) [blast]
dochtml=T/F : Generate HTML Snapper documentation (*.docs.html) instead of main run [False]

Reference Feature Options

spcode=X : Overwrite species read from file (if any!) with X if generating sequence file from genbank [None]
ftfile=FILE : Input feature file (locus,feature,position,start,end) [*.Feature.tdt]
ftskip=LIST : List of feature types to exclude from analysis [source]
ftbest=LIST : List of features to exclude if earlier feature in list overlaps position [(see above)]

SNP Mapping Options

snpmap=FILE : Input table of SNPs for standalone mapping and output (should have locus and pos info) [None]
snphead=LIST : List of SNP file headers (should include Locus, Pos and ALT fields) []
snpdrop=LIST : List of SNP fields to drop []
altpos=T/F : Whether SNP file is a single mapping (with AltPos) (False=BCF) [True]
altft=T/F : Use AltLocus and AltPos for feature mapping (if altpos=T) [False]
localsort=X : Local hit field used to sort local alignments for localunique reduction [Identity]
localmin=X : Minimum length of local alignment to output to local stats table [10]
localidmin=PERC : Minimum local %identity of local alignment to output to local stats table [0.0]

History Module Version History

    # 0.0.0 - Initial Compilation.
    # 0.1.0 - Tidied up with improved run pickup.
    # 0.2.0 - Added FASTQ and improved CNV output along with all features.
    # 0.2.1 - Fixed local output error. (Query/Qry issue - need to fix this and make consistent!) Fixed snp local table revcomp bug.
    # 0.2.2 - Corrected excess CNV table output (accnum AND shortname).
    # 0.2.3 - Corrected "intron" classification for first position of features. Updated FTBest defaults.
    # 1.0.0 - Working version with completed draft manual. Added to SeqSuite.
    # 1.0.1 - Fixed issues when features missing.
    # 1.1.0 - NoCopy fasta output
    # 1.2.0 - makesnp=T/F : Whether or not to generate Query vs Reference SNP tables [True]
    # 1.3.0 - localsAM=T/F : Save local (and unique) hits data as SAM files in addition to TDT [False] - via GABLAM
    # 1.4.0 - localidmin=PERC : Minimum local %identity of local alignment to output to local stats table [0.0]
    # 1.4.1 - Modified warning for AccNum/Locus mismatch in Reference.
    # 1.5.0 - Added pNS and modified the "Positive" CDS rating to be pNS < 0.05.
    # 1.6.0 - filterself=T/F  : Filter out self-hits prior to Snapper pipeline (e.g for assembly all-by-all) [False]
    # 1.6.0 - Added renaming of alt sequences that are found in the Reference for self-comparisons.
    # 1.6.1 - Fixed bug for reducing to unique-unique pairings that was over-filtering.
    # 1.7.0 - Added mapper=minimap setting, compatible with GABLAM v2.30.0 and rje_paf v0.1.0.
    # 1.8.0 - Added dochtml=T and modified docstring for standalone git repo.
    # 1.8.1 - Bug fixing SNPMap mode.

Snapper REST Output formats

Run with &rest=help for general options. Run with &rest=full to get full server output as text or &rest=format
for more user-friendly formatted output. Individual outputs can be identified/parsed using &rest=OUTFMT.

SLiMSuite REST Server

Snapper V1.8.1