Module:	SeqMapper
Description:	Sequence Mapping Program
Version:	2.3.0
Last Edit:	17/10/18

Imported modules: rje rje_menu rje_obj rje_seq rje_seqlist rje_zen rje_blast_V2 rje_sequence

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This module is for mapping one set of protein sequences onto a different sequence database, using Accession Numbers etc where possible and then using GABLAM when no direct match is possible. The program gives the following outputs: - *.*.mapped.fas = Fasta file of successfully mapped sequences - *.*.missing.fas = Fasta file of sequences that could not be mapped - *.*.mapping.tdt = Delimited file giving details of mapping (Seq, MapSeq, Method) If combine=T then the *.missing.fas file will not be created and unmapped sequences will be output in *.mapped.fas. Note that the possible mappings are all identified through BLAST and so a protein with matching IDs etc. but not hitting with BLAST will NOT be mapped. Currently only mapping of protein or nucleotides onto a protein database is supported.

Unless the interactivity setting is set to 2 or more (i=2), sequences that are mapped using Name, AccNum, Sequence (100% identical sequences), ID or DescAcc will be mapped onto the first appropriate sequence. If automap > 0, then the best sequence according to the mapstat will be mapped automatically. If two sequences tie, the other two possible stats will also be used to rank the hits. If still tied and mapfocus is not "both" then the sequences will be ranked using both query and hit stats. If still tied, the first sequence will be selected.

Any sequences that fall below automap (or i>1) but meet the minmap criteria will be ranked according to their BLAST rankings and then presented for a user decision. Presentation will be in reverse order, so that in the case of many possible mappings, the best options remain clear and on screen. The default choice (selected by hitting ENTER) will be the best ranked according to GABLAM stats, which will have been moved to position 1 if not already there. (BLAST rankings and GABLAM rankings will not always agree.)

SeqMapper will enter a user menu if i>1 or seqin and/or mapdb are missing. If i=0 and one of these is missing, a simple prompt will ask for the missing files. If i<0 and one of these is missing, the program will exit.

Commandline

### Input Options ###
seqin=FILE : File of sequences to be mapped [None]
mapdb=FILE : File of sequences to map sequences onto [None]
startfrom=X : Shortname or AccNum of seqin file to startfrom (will append results) (memsaver=T only) [None]
### Output Options ###
resfile=FILE : Base of output filenames (*.mapped.fas, *.missing.fas & *.mapping.tdt) [seqin.mapdb]
combine=T/F : Combine both fasta files in one (e.g. include unmapped sequences in *.mapping.fas) [False]
gablamout=T/F : Output GABLAM statistics for mapped sequences, including "straight" matches [True]
append=T/F : Append rather than overwrite results files [False]
delimit=X : Delimiter for *.mapping.* file (will set extension) [tab]
basefile=FILE : Set resfile=FILE and log=FILE at the same time []
### Mapping Options ###
i=X : Set interactivity. i=-1 full auto. i=0 no menu. i=1 interactive menu. [1]
mapspec=X : Maps sequences onto given species code. "Self" = same species as query. "None" = any. [None]
mapping=LIST : Possible ways of mapping sequences (in pref order) [Name,AccNum,Sequence,ID,DescAcc,GABLAM,grep]
- Name = First word of sequence name
- Sequence = Identical sequence
- grep = grep-based searching of sequence if no hits
- ID = SwissProt style ID of GENE_SPECIES (note that the species may be changed according to mapspec)
- AccNum = Primary Accession Number
- DescAcc = Accession Number featured in description line in form "\WAccNum\W", where \W is non-
skipgene=LIST : List of "genes" in protein IDs to ignore [ens,nvl,ref,p,hyp,frag]
mapstat=X : GABLAM Stat to use for mapping assessment (if GABLAM in mapping list) (ID/Sim/Len) [ID]
minmap=X : Minimum value of mapstat for any mapping to occur [90.0]
automap=X : Minimum value of mapstat for automatic mapping to occur (if i<1) [99.5]
ordered=T/F : Whether to use GABLAMO rather than GABLAM stat [True]
mapfocus=X : Focus for mapping statistic, i.e. which sequence must meet requirements [query]
- query = Best if query is ultimate focus and maximises closeness of mapped sequence)
- hit = Best if lots of sequence fragments are in mapdb and should be allowed as mappings
- either = Best if both above conditions are true
- both = Gets most similar sequences in terms of length but can be quite strict where length errors exist

### Advanced BLAST Options ###
blaste=X : E-Value cut-off for BLAST searches (BLAST -e X) [1e-4]
blastv=X : Number of BLAST hits to return per query (BLAST -v X) [20]
blastf=T/F : Complexity Filter (BLAST -F X) [False]

History Module Version History

    # 0.0 - Initial Compilation.
    # 1.0 - Basic working version for protein databases.
    # 1.1 - Modified run() method to be called from other programs
    # 1.2 - Added grep method
    # 2.0 - Reworked with new Object format, new BLAST(+) module and new seqlist module.
    # 2.1 - Added catching of failure to read input sequences. Removed 'Run' from GABLAM table.
    # 2.2.0 - Updated basefile to set resfile.
    # 2.3.0 - Added GABLAM-free method.

SeqMapper REST Output formats

Run with &rest=docs for program documentation and options. A plain text version is accessed with &rest=help.
&rest=OUTFMT can be used to retrieve individual parts of the output, matching the tabs in the default
(&rest=format) output. Individual OUTFMT elements can also be parsed from the full (&rest=full) server output,
which is formatted as follows:

###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~###
# OUTFMT:
... contents for OUTFMT section ...

Available REST Outputs

There is currently no specific help available on REST output for this program.

SLiMSuite REST Server

SeqMapper V2.3.0

Sequence Mapping Program

Function

Commandline

History Module Version History

SeqMapper REST Output formats

Available REST Outputs