SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
Genomes
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

SLiMProb V1.4

Short Linear Motif Probability tool

Program: SLiMProb
Description: Short Linear Motif Probability tool
Version: 1.4
Last Edit: 11/07/14
Citation: Davey, Haslam, Shields & Edwards (2010), Lecture Notes in Bioinformatics 6282: 50-61.

Copyright © 2007 Richard J. Edwards - See source code for GNU License Notice


Imported modules: rje rje_seq rje_sequence rje_scoring rje_slim rje_slimcalc rje_slimlist rje_zen


See SLiMSuite Blog for further documentation. See rje for general commands.

Function

SLiMProb is a tool for finding pre-defined SLiMs (Short Linear Motifs) in a protein sequence database. SLiMProb can make use of corrections for evolutionary relationships and a variation of the SLiMChance alogrithm from SLiMFinder to assess motifs for statistical over- and under-representation. SLiMProb is replace for the original SLiMSearch, which itself was a replacement for PRESTO. The basic architecture is the same but it was felt that having two different "SLiMSearch" servers was confusing.

Benefits of SLiMProb that make it more useful than a lot of existing tools include:

  • searching with mismatches rather than restricting hits to perfect matches.
  • optional equivalency files for searching with specific allowed mismatched (e.g. charge conservation)
  • generation or reading of alignment files from which to calculate conservation statistics for motif occurrences.
  • additional statistics, including protein disorder, surface accessibility and hydrophobicity predictions
  • recognition of "n of m" motif elements in the form <X:n:m>, where X is one or more amino acids that must occur n+
  • times across which m positions. E.g. <IL:3:5> must have 3+ Is and/or Ls in a 5aa stretch.

    Main output for SLiMProb is a delimited file of motif/peptide occurrences but the motifaln=T and proteinaln=T also allow output of alignments of motifs and their occurrences. The primary outputs are named *.occ.csv for the occurrence data and *.csv for the summary data for each motif/dataset pair. (This is a change since SLiMSearch.)

Commandline

### Basic Input/Output Options ###
motifs=FILE : File of input motifs/peptides [None]
Single line per motif format = 'Name Sequence #Comments' (Comments are optional and ignored)
Alternative formats include fasta, SLiMDisc output and raw motif lists.
seqin=FILE : Sequence file to search [None]
batch=LIST : List of sequence files for batch input (wildcard * permitted) []
maxseq=X : Maximum number of sequences to process [0]
maxsize=X : Maximum dataset size to process in AA (or NT) [100,000]
maxocc=X : Filter out Motifs with more than maximum number of occurrences [0]
walltime=X : Time in hours before program will abort search and exit [1.0]
resfile=FILE : Main SLiMProb results table (*.csv and *.occ.csv) [slimprob.csv]
resdir=PATH : Redirect individual output files to specified directory (and look for intermediates) [SLiMProb/]
buildpath=PATH : Alternative path to look for existing intermediate files [SLiMProb/]
force=T/F : Force re-running of BLAST, UPC generation and search [False]
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

SearchDB Options I

masking=T/F : Master control switch to turn off all masking if False [False]
dismask=T/F : Whether to mask ordered regions (see rje_disorder for options) [False]
consmask=T/F : Whether to use relative conservation masking [False]
ftmask=LIST : UniProt features to mask out [EM,DOMAIN,TRANSMEM]
imask=LIST : UniProt features to inversely ("inclusively") mask. (Seqs MUST have 1+ features) []
compmask=X,Y : Mask low complexity regions (same AA in X+ of Y consecutive aas) [5,8]
casemask=X : Mask Upper or Lower case [None]
motifmask=X : List (or file) of motifs to mask from input sequences []
metmask=T/F : Masks the N-terminal M [False]
posmask=LIST : Masks list of position-specific aas, where list = pos1:aas,pos2:aas [2:A]
aamask=LIST : Masks list of AAs from all sequences (reduces alphabet) []

SearchDB Options II

efilter=T/F : Whether to use evolutionary filter [False]
blastf=T/F : Use BLAST Complexity filter when determining relationships [True]
blaste=X : BLAST e-value threshold for determining relationships [1e=4]
altdis=FILE : Alternative all by all distance matrix for relationships [None]
gablamdis=FILE : Alternative GABLAM results file [None] (!!!Experimental feature!!!)
occupc=T/F : Whether to output the UPC ID number in the occurrence output file [False]
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
### SLiMChance Options ###
maskfreq=T/F : Whether to use masked AA Frequencies (True), or (False) mask after frequency calculations [True]
aafreq=FILE : Use FILE to replace individual sequence AAFreqs (FILE can be sequences or aafreq) [None]
aadimerfreq=FILE: Use empirical dimer frequencies from FILE (fasta or *.aadimer.tdt) [None]
negatives=FILE : Multiply raw probabilities by under-representation in FILE [None]
background=FILE : Use observed support in background file for over-representation calculations [None]
smearfreq=T/F : Whether to "smear" AA frequencies across UPC rather than keep separate AAFreqs [False]
seqocc=X : Restrict to sequences with X+ occurrences (adjust for high frequency SLiMs) [1]
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
### Output Options ###
extras=X : Whether to generate additional output files (alignments etc.) [2]
- 0 = No output beyond main results file
- 1 = Saved masked input sequences [*.masked.fas]
- 2 = Generate additional outputs (alignments etc.)
pickle=T/F : Whether to save/use pickles [True]
targz=T/F : Whether to tar and zip dataset result files (UNIX only) [False]
savespace=0 : Delete "unneccessary" files following run (best used with targz): [0]
- 0 = Delete no files
- 1 = Delete all bar *.upc and *.pickle files
- 2 = Delete all dataset-specific files including *.upc and *.pickle (not *.tar.gz)

  • See also rje_slimcalc options for occurrence-based calculations and filtering *
  • #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

History Module Version History

    # 1.0 - SLiMProb 1.0 based on SLiMSearch 1.7. Altered output files to be *.csv and *.occ.csv.
    # 1.1 - Tidied import commands.
    # 1.2 - Increased extras=X levels. Adjusted maxsize=X assessment to be post-masking.
    # 1.3 - Consolidating output file naming for consistency across SLiMSuite. (SLiMBuild = Motif input)
    # 1.4 - Preparation for SLiMProb V2.0 & SLiMCore V2.0 using newer RJE_Object.

© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.