Module:	SNP_Mapper
Description:	SNP consensus sequence to CDS mapping
Version:	1.2.1
Last Edit:	03/05/21

Imported modules: rje rje_db rje_genbank rje_obj rje_seqlist rje_sequence rje_zen

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This module is in development. The following documentation should be considered aspirational rather than accurate!

This module is for mapping SNPs onto a genbank file (or similar feature annotation) and reporting the possible effect of SNPs in addition to more detailed output for individual genes and proteins. The main SNP output will be restricted to feature reporting and coding changes.

NOTE: At present, this does not handle indels properly, nor multiple SNPS affecting the same amino acid. It cannot deal with introns.

Primary input for this program is:

1. Reference genome sequence. This should be a DNA sequence file in which the accession numbers match the Locus field for the SNP file (below). If a Genbank file is given, the *.full.fas file will be used.

2. A feature table with the headers: locus feature position start end product gene_synonym note db_xref locus_tag details If this is not given, a *.Feature.tdt file based on seqin=FILE will be sought and loaded.

3. The SNP file (snpfile=FILE) should have a Locus, Pos, REF and ALT fields, which will be used to map onto the features file and uniquely mark variants in the reference genome. If running in (default) altpos=T mode, this file will represent the mapping of a single pair of genomes: there will be no multiple-allele entries and the file headers should include AltPos and AltLocus for making unique entries. (Reference sequences can map to multiple query sequences.) If running on BCF output, alleles will be split on commas but there will be no compiled sequence output. ( Not yet implemented! )

The default FTBest hierarchy for *.ftypes.tdt output is: CDS,mRNA,tRNA,rRNA,ncRNA,misc_RNA,gene,mobile_element,LTR,rep_origin,telomere,centromere,misc_feature,intergenic

To be added

Sequence output and recognition of BCF files to be added.

Old Function

This module reads in an alignment of a coding sequence with a consensus sequence and a list of polymorphic sites
relative to consensus. Polymorphisms are mapped onto the coding sequence and the desired output produced.

Commandline

Basic SNP mapping functions

seqin=FILE : Sequence input file with accession numbers matching Locus IDs, or Genbank file. []
spcode=X : Overwrite species read from file (if any!) with X if generating sequence file from genbank [None]
ftfile=FILE : Input feature file (locus,feature,position,start,end) [*.Feature.tdt]
ftskip=LIST : List of feature types to exclude from analysis [source]
ftbest=LIST : List of features to exclude if earlier feature in list overlaps position [(see above)]
snpbyftype=T/F : Whether to output mapped SNPs by feature type (before FTBest filtering) [False]
snpfile=FILE : Input table of SNPs to map and output (should have locus and pos info, see above) []
snphead=LIST : List of SNP file headers []
snpdrop=LIST : List of SNP fields to drop []
altpos=T/F : Whether SNP file is a single mapping (with AltPos) (False=BCF) [True]
altft=T/F : Use AltLocus and AltPos for feature mapping (if altpos=T) [False]
basefile=FILE : Root of output file names (same as SNP input file by default) []

Old Options (need reviving)

batch=LIST : List of alignment files to read X.aln - must have X_polymorphisms.txt too [*.aln]
screenmatch=LIST: List of genotypes for which to screen out matching SNPs []

Obsolete Options (roll back to pre v0.3.0 if required)

genotypes=LIST : List of snpfile headers corresponding to genotypes to map SNPs []
genbase=FILE : Basefile for Genbank output. Will use base of seqin if None []
snpkeys=LIST : Additional headers to use as keys for SNP file (e.g. if mapping done by chromosome) []

See also rje.py generic commandline options.

History Module Version History

    # 0.0 - Initial Compilation. Batch mode for mapping SNPs needs updating.
    # 0.1 - SNP mapping against a GenBank file.
    # 0.2 - Fixed complement strand bug.
    # 0.3.0 - Updated to work with RATT(/Mummer?) snp output file. Improved docs.
    # 0.4.0 - Major reworking for easier updates and added functionality. (Convert to 1.0.0 when complete.)
    # 0.5.0 - Added CDS rating.
    # 0.6.0 - Added AltFT mapping mode (map features to AltLocus and AltPos)
    # 0.7.0 - Added additional fields for processing Snapper output. (Hopefully will still work for SAMTools etc.)
    # 0.8.0 - Added parsing of GFF file from Prokka.
    # 0.8.1 - Corrected "intron" classification for first position of features. Updated FTBest defaults.
    # 1.0.0 - Version that works with Snapper V1.0.0. Not really designed for standalone running any more.
    # 1.1.0 - Added pNS and modified the "Positive" CDS rating to be pNS < 0.05.
    # 1.1.1 - Updated pNS calculation to include EXT mutations and substitution frequency.
    # 1.2.0 - SNPByFType=T/F  : Whether to output mapped SNPs by feature type (before FTBest filtering) [False]
    # 1.2.1 - Fixed GFF parsing bug.

SNP_Mapper REST Output formats

Run with &rest=docs for program documentation and options. A plain text version is accessed with &rest=help.
&rest=OUTFMT can be used to retrieve individual parts of the output, matching the tabs in the default
(&rest=format) output. Individual OUTFMT elements can also be parsed from the full (&rest=full) server output,
which is formatted as follows:

###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~###
# OUTFMT:
... contents for OUTFMT section ...

Available REST Outputs

There is currently no specific help available on REST output for this program.

SLiMSuite REST Server

SNP_Mapper V1.2.1

SNP consensus sequence to CDS mapping