Program:	SLiMProb
Description:	Short Linear Motif Probability tool
Version:	2.5.1
Last Edit:	06/09/17
Citation:	Davey, Haslam, Shields & Edwards (2010), Lecture Notes in Bioinformatics 6282: 50-61.
Webserver:	http://www.slimsuite.unsw.edu.au/servers/slimprob.php
Manual:	http://bit.ly/SProbManual

Imported modules: rje rje_db rje_seq rje_sequence rje_scoring rje_slim rje_slimcore rje_slimcalc rje_slimlist rje_zen

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

SLiMProb is a tool for finding pre-defined SLiMs (Short Linear Motifs) in a protein sequence database. SLiMProb can make use of corrections for evolutionary relationships and a variation of the SLiMChance alogrithm from SLiMFinder to assess motifs for statistical over- and under-representation. SLiMProb is a replacement for the original SLiMSearch, which itself was a replacement for PRESTO. The basic architecture is the same but it was felt that having two different "SLiMSearch" servers was confusing.

Benefits of SLiMProb that make it more useful than a lot of existing tools include:

searching with mismatches rather than restricting hits to perfect matches.
optional equivalency files for searching with specific allowed mismatched (e.g. charge conservation)
generation or reading of alignment files from which to calculate conservation statistics for motif occurrences.
additional statistics, including protein disorder, surface accessibility and hydrophobicity predictions
recognition of "n of m" motif elements in the form <X:n:m>, where X is one or more amino acids that must occur n+

Main output for SLiMProb is a delimited file of motif/peptide occurrences but the motifaln=T and proteinaln=T also allow output of alignments of motifs and their occurrences. The primary outputs are named *.occ.csv for the occurrence data and *.csv for the summary data for each motif/dataset pair. (This is a change since SLiMSearch.)

Commandline

Basic Input/Output Options

motifs=FILE : File of input motifs/peptides (also motif=X) [None]
Single line per motif format = 'Name Sequence #Comments' (Comments are optional and ignored)
Alternative formats include fasta, SLiMDisc output and raw motif lists.
seqin=SEQFILE : Sequence file to search. Over-rules batch=FILE and uniprotid=LIST [None]
batch=FILELIST : List of files to search, wildcards allowed. (Over-ruled by seqin=FILE.) [*.dat,*.fas]
uniprotid=LIST : Extract IDs/AccNums in list from Uniprot into BASEFILE.dat and use as seqin=FILE. []
maxseq=X : Maximum number of sequences to process [0]
maxsize=X : Maximum dataset size to process in AA (or NT) [100,000]
maxocc=X : Filter out Motifs with more than maximum number of occurrences [0]
walltime=X : Time in hours before program will abort search and exit [1.0]
resfile=FILE : Main SLiMProb results table (*.csv and *.occ.csv) [slimprob.csv]
resdir=PATH : Redirect individual output files to specified directory (and look for intermediates) [SLiMProb/]
buildpath=PATH : Alternative path to look for existing intermediate files [SLiMProb/]
force=T/F : Force re-running of BLAST, UPC generation and search [False]
dna=T/F : Whether the sequences files are DNA rather than protein [False]
alphabet=LIST : List of characters to include in search (e.g. AAs or NTs) [default AA or NT codes]
megaslim=FILE : Make/use precomputed results for a proteome (FILE) in fasta format [None]
megablam=T/F : Whether to create and use all-by-all GABLAM results for (gablamdis) UPC generation [False]
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

SearchDB Options I - Input Protein Sequence Masking

masking=T/F : Master control switch to turn off all masking if False [True]
dismask=T/F : Whether to mask ordered regions (see rje_disorder for options) [False]
consmask=T/F : Whether to use relative conservation masking [False]
ftmask=LIST : UniProt features to mask out (True=EM,DOMAIN,TRANSMEM) []
imask=LIST : UniProt features to inversely ("inclusively") mask. (Seqs MUST have 1+ features) []
compmask=X,Y : Mask low complexity regions (same AA in X+ of Y consecutive aas) [None]
casemask=X : Mask Upper or Lower case [None]
motifmask=X : List (or file) of motifs to mask from input sequences []
metmask=T/F : Masks the N-terminal M [False]
posmask=LIST : Masks list of position-specific aas, where list = pos1:aas,pos2:aas []
aamask=LIST : Masks list of AAs from all sequences (reduces alphabet) []

SearchDB Options II - Evolutionary Filtering

efilter=T/F : Whether to use evolutionary filter [True]
blastf=T/F : Use BLAST Complexity filter when determining relationships [True]
blaste=X : BLAST e-value threshold for determining relationships [1e=4]
altdis=FILE : Alternative all by all distance matrix for relationships [None]
gablamdis=FILE : Alternative GABLAM results file [None] (!!!Experimental feature!!!)
occupc=T/F : Whether to output the UPC ID number in the occurrence output file [False]
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

SLiMChance Options

maskfreq=T/F : Whether to use masked AA Frequencies (True), or (False) mask after frequency calculations [True]
aafreq=FILE : Use FILE to replace individual sequence AAFreqs (FILE can be sequences or aafreq) [None]
aadimerfreq=FILE: Use empirical dimer frequencies from FILE (fasta or *.aadimer.tdt) [None]
negatives=FILE : Multiply raw probabilities by under-representation in FILE [None]
background=FILE : Use observed support in background file for over-representation calculations [None]
smearfreq=T/F : Whether to "smear" AA frequencies across UPC rather than keep separate AAFreqs [False]
seqocc=X : Restrict to sequences with X+ occurrences (adjust for high frequency SLiMs) [1]
mergesplits=T/F : Whether to merge split SLiMs for recalculating statistics. (Assumes unique RunIDs) [True]
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

Output Options

extras=X : Whether to generate additional output files (alignments etc.) [2]
- 0 = No output beyond main results file
- 1 = Saved masked input sequences [*.masked.fas]
- 2 = Generate additional outputs (alignments etc.)
- 3 = Additional distance matrices for input sequences
pickle=T/F : Whether to save/use pickles [True]
targz=T/F : Whether to tar and zip dataset result files (UNIX only) [False]
savespace=0 : Delete "unneccessary" files following run (best used with targz): [0]
- 0 = Delete no files
- 1 = Delete all bar *.upc and *.pickle files
- 2 = Delete all dataset-specific files including *.upc and *.pickle (not *.tar.gz)

See also rje_slimcalc options for occurrence-based calculations and filtering *

History Module Version History

    # 1.0 - SLiMProb 1.0 based on SLiMSearch 1.7. Altered output files to be *.csv and *.occ.csv.
    # 1.1 - Tidied import commands.
    # 1.2 - Increased extras=X levels. Adjusted maxsize=X assessment to be post-masking.
    # 1.3 - Consolidating output file naming for consistency across SLiMSuite. (SLiMBuild = Motif input)
    # 1.4 - Preparation for SLiMProb V2.0 & SLiMCore V2.0 using newer RJE_Object.
    # 2.0 - Converted to use rje_obj.RJE_Object as base. Version 1.4 moved to legacy/.
    # 2.1 - Modified output of N-terminal motifs to correctly start at position 1.
    # 2.2.0 - Added basic REST functionality.
    # 2.2.1 - Updated REST output.
    # 2.2.2 - Modified input to allow motif=X in addition to motifs=X.
    # 2.2.3 - Tweaked basefile setting and citation.
    # 2.2.4 - Improved slimcalc output (s.f.).
    # 2.2.5 - Fixed FTMask=T/F bug.
    # 2.3.0 - Recombining Split Motifs (mergesplits=T). Cannot be combined efficiently with append=T. (Overwrites split table.)
    # 2.4.0 - Added AccNum to occ output for SLiMEnrich compatibility.
    # 2.5.0 - Added map and failed outputs for uniprotid=LIST input.
    # 2.5.1 - Updated resfile to be set by basefile if no resfile=X setting given.

SLiMProb REST Output formats

Run with &rest=help for general options. Run with &rest=full to get full server output.
Individual outputs can be identified/parsed:

###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~###
# OUTFMT:
...

Outputs available:
main = main results file (extras=-1)
motifs = Input motifs for searching (extras=-1)
seqin = Input file (extras=-1)
occ = occurrence file (extras=0)
upc = UPC file (extras=0)
slimdb = Fasta file used for UPC generation etc. (extras=0)
masked = masked.fas (extras=1)
mapping = mapping.fas file (extras=2)
motifaln = motif alignments (extras=2)
maskaln = masked motif alignments (extras=2)
dismatrix = *.dis.tdt file (extras=3)

&rest=OUTFMT can then be used to retrieve individual parts of the output in future.

SLiMSuite REST Server

SLiMProb V2.5.1

Short Linear Motif Probability tool