SLiMSuite REST Server

EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
REST Pages
REST Status
REST Tools
REST Alias Data
REST Sitemap

SLiMProb V2.5.1

Short Linear Motif Probability tool

Program: SLiMProb
Description: Short Linear Motif Probability tool
Version: 2.5.1
Last Edit: 06/09/17
Citation: Davey, Haslam, Shields & Edwards (2010), Lecture Notes in Bioinformatics 6282: 50-61.

Copyright © 2007 Richard J. Edwards - See source code for GNU License Notice

Imported modules: rje rje_db rje_seq rje_sequence rje_scoring rje_slim rje_slimcore rje_slimcalc rje_slimlist rje_zen

See SLiMSuite Blog for further documentation. See rje for general commands.


SLiMProb is a tool for finding pre-defined SLiMs (Short Linear Motifs) in a protein sequence database. SLiMProb can make use of corrections for evolutionary relationships and a variation of the SLiMChance alogrithm from SLiMFinder to assess motifs for statistical over- and under-representation. SLiMProb is a replacement for the original SLiMSearch, which itself was a replacement for PRESTO. The basic architecture is the same but it was felt that having two different "SLiMSearch" servers was confusing.

Benefits of SLiMProb that make it more useful than a lot of existing tools include:

  • searching with mismatches rather than restricting hits to perfect matches.
  • optional equivalency files for searching with specific allowed mismatched (e.g. charge conservation)
  • generation or reading of alignment files from which to calculate conservation statistics for motif occurrences.
  • additional statistics, including protein disorder, surface accessibility and hydrophobicity predictions
  • recognition of "n of m" motif elements in the form <X:n:m>, where X is one or more amino acids that must occur n+
  • times across which m positions. E.g. <IL:3:5> must have 3+ Is and/or Ls in a 5aa stretch.

    Main output for SLiMProb is a delimited file of motif/peptide occurrences but the motifaln=T and proteinaln=T also allow output of alignments of motifs and their occurrences. The primary outputs are named *.occ.csv for the occurrence data and *.csv for the summary data for each motif/dataset pair. (This is a change since SLiMSearch.)


Basic Input/Output Options

motifs=FILE : File of input motifs/peptides (also motif=X) [None]
Single line per motif format = 'Name Sequence #Comments' (Comments are optional and ignored)
Alternative formats include fasta, SLiMDisc output and raw motif lists.
seqin=SEQFILE : Sequence file to search. Over-rules batch=FILE and uniprotid=LIST [None]
batch=FILELIST : List of files to search, wildcards allowed. (Over-ruled by seqin=FILE.) [*.dat,*.fas]
uniprotid=LIST : Extract IDs/AccNums in list from Uniprot into BASEFILE.dat and use as seqin=FILE. []
maxseq=X : Maximum number of sequences to process [0]
maxsize=X : Maximum dataset size to process in AA (or NT) [100,000]
maxocc=X : Filter out Motifs with more than maximum number of occurrences [0]
walltime=X : Time in hours before program will abort search and exit [1.0]
resfile=FILE : Main SLiMProb results table (*.csv and *.occ.csv) [slimprob.csv]
resdir=PATH : Redirect individual output files to specified directory (and look for intermediates) [SLiMProb/]
buildpath=PATH : Alternative path to look for existing intermediate files [SLiMProb/]
force=T/F : Force re-running of BLAST, UPC generation and search [False]
dna=T/F : Whether the sequences files are DNA rather than protein [False]
alphabet=LIST : List of characters to include in search (e.g. AAs or NTs) [default AA or NT codes]
megaslim=FILE : Make/use precomputed results for a proteome (FILE) in fasta format [None]
megablam=T/F : Whether to create and use all-by-all GABLAM results for (gablamdis) UPC generation [False]

SearchDB Options I - Input Protein Sequence Masking

masking=T/F : Master control switch to turn off all masking if False [True]
dismask=T/F : Whether to mask ordered regions (see rje_disorder for options) [False]
consmask=T/F : Whether to use relative conservation masking [False]
ftmask=LIST : UniProt features to mask out (True=EM,DOMAIN,TRANSMEM) []
imask=LIST : UniProt features to inversely ("inclusively") mask. (Seqs MUST have 1+ features) []
compmask=X,Y : Mask low complexity regions (same AA in X+ of Y consecutive aas) [None]
casemask=X : Mask Upper or Lower case [None]
motifmask=X : List (or file) of motifs to mask from input sequences []
metmask=T/F : Masks the N-terminal M [False]
posmask=LIST : Masks list of position-specific aas, where list = pos1:aas,pos2:aas []
aamask=LIST : Masks list of AAs from all sequences (reduces alphabet) []

SearchDB Options II - Evolutionary Filtering

efilter=T/F : Whether to use evolutionary filter [True]
blastf=T/F : Use BLAST Complexity filter when determining relationships [True]
blaste=X : BLAST e-value threshold for determining relationships [1e=4]
altdis=FILE : Alternative all by all distance matrix for relationships [None]
gablamdis=FILE : Alternative GABLAM results file [None] (!!!Experimental feature!!!)
occupc=T/F : Whether to output the UPC ID number in the occurrence output file [False]

SLiMChance Options

maskfreq=T/F : Whether to use masked AA Frequencies (True), or (False) mask after frequency calculations [True]
aafreq=FILE : Use FILE to replace individual sequence AAFreqs (FILE can be sequences or aafreq) [None]
aadimerfreq=FILE: Use empirical dimer frequencies from FILE (fasta or *.aadimer.tdt) [None]
negatives=FILE : Multiply raw probabilities by under-representation in FILE [None]
background=FILE : Use observed support in background file for over-representation calculations [None]
smearfreq=T/F : Whether to "smear" AA frequencies across UPC rather than keep separate AAFreqs [False]
seqocc=X : Restrict to sequences with X+ occurrences (adjust for high frequency SLiMs) [1]
mergesplits=T/F : Whether to merge split SLiMs for recalculating statistics. (Assumes unique RunIDs) [True]

Output Options

extras=X : Whether to generate additional output files (alignments etc.) [2]
- 0 = No output beyond main results file
- 1 = Saved masked input sequences [*.masked.fas]
- 2 = Generate additional outputs (alignments etc.)
- 3 = Additional distance matrices for input sequences
pickle=T/F : Whether to save/use pickles [True]
targz=T/F : Whether to tar and zip dataset result files (UNIX only) [False]
savespace=0 : Delete "unneccessary" files following run (best used with targz): [0]
- 0 = Delete no files
- 1 = Delete all bar *.upc and *.pickle files
- 2 = Delete all dataset-specific files including *.upc and *.pickle (not *.tar.gz)

  • See also rje_slimcalc options for occurrence-based calculations and filtering *
  • #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

History Module Version History

    # 1.0 - SLiMProb 1.0 based on SLiMSearch 1.7. Altered output files to be *.csv and *.occ.csv.
    # 1.1 - Tidied import commands.
    # 1.2 - Increased extras=X levels. Adjusted maxsize=X assessment to be post-masking.
    # 1.3 - Consolidating output file naming for consistency across SLiMSuite. (SLiMBuild = Motif input)
    # 1.4 - Preparation for SLiMProb V2.0 & SLiMCore V2.0 using newer RJE_Object.
    # 2.0 - Converted to use rje_obj.RJE_Object as base. Version 1.4 moved to legacy/.
    # 2.1 - Modified output of N-terminal motifs to correctly start at position 1.
    # 2.2.0 - Added basic REST functionality.
    # 2.2.1 - Updated REST output.
    # 2.2.2 - Modified input to allow motif=X in addition to motifs=X.
    # 2.2.3 - Tweaked basefile setting and citation.
    # 2.2.4 - Improved slimcalc output (s.f.).
    # 2.2.5 - Fixed FTMask=T/F bug.
    # 2.3.0 - Recombining Split Motifs (mergesplits=T). Cannot be combined efficiently with append=T. (Overwrites split table.)
    # 2.4.0 - Added AccNum to occ output for SLiMEnrich compatibility.
    # 2.5.0 - Added map and failed outputs for uniprotid=LIST input.
    # 2.5.1 - Updated resfile to be set by basefile if no resfile=X setting given.

SLiMProb REST Output formats

Run with &rest=help for general options. Run with &rest=full to get full server output.
Individual outputs can be identified/parsed:

Outputs available:
main = main results file (extras=-1)
motifs = Input motifs for searching (extras=-1)
seqin = Input file (extras=-1)
occ = occurrence file (extras=0)
upc = UPC file (extras=0)
slimdb = Fasta file used for UPC generation etc. (extras=0)
masked = masked.fas (extras=1)
mapping = mapping.fas file (extras=2)
motifaln = motif alignments (extras=2)
maskaln = masked motif alignments (extras=2)
dismatrix = *.dis.tdt file (extras=3)

&rest=OUTFMT can then be used to retrieve individual parts of the output in future.

© 2015 RJ Edwards. Contact: