|Description:|| Short Linear Motif Probability tool|
|Last Edit:|| 11/07/14|
|Citation:|| Davey, Haslam, Shields & Edwards (2010), Lecture Notes in Bioinformatics 6282: 50-61. |
Copyright © 2007 Richard J. Edwards - See source code for GNU License Notice
See SLiMSuite Blog for further documentation. See
rje for general commands.
SLiMProb is a tool for finding pre-defined SLiMs (Short Linear Motifs) in a protein sequence database. SLiMProb
can make use of corrections for evolutionary relationships and a variation of the SLiMChance alogrithm from
SLiMFinder to assess motifs for statistical over- and under-representation. SLiMProb is replace for the original
SLiMSearch, which itself was a replacement for PRESTO. The basic architecture is the same but it was felt that having
two different "SLiMSearch" servers was confusing.
Benefits of SLiMProb that make it more useful than a lot of existing tools include:
- searching with mismatches rather than restricting hits to perfect matches.
- optional equivalency files for searching with specific allowed mismatched (e.g. charge conservation)
- generation or reading of alignment files from which to calculate conservation statistics for motif occurrences.
- additional statistics, including protein disorder, surface accessibility and hydrophobicity predictions
- recognition of "n of m" motif elements in the form <X:n:m>, where X is one or more amino acids that must occur n+
times across which m positions. E.g. <IL:3:5> must have 3+ Is and/or Ls in a 5aa stretch.
Main output for SLiMProb is a delimited file of motif/peptide occurrences but the
allow output of alignments of motifs and their occurrences. The primary outputs are named *.occ.csv for the occurrence
data and *.csv for the summary data for each motif/dataset pair. (This is a change since SLiMSearch.)
### Basic Input/Output Options ###
motifs=FILE : File of input motifs/peptides [
Single line per motif format = 'Name Sequence #Comments' (Comments are optional and ignored)
Alternative formats include fasta, SLiMDisc output and raw motif lists.
seqin=FILE : Sequence file to search [
batch=LIST : List of sequence files for batch input (wildcard * permitted) 
maxseq=X : Maximum number of sequences to process [
maxsize=X : Maximum dataset size to process in AA (or NT) [
maxocc=X : Filter out Motifs with more than maximum number of occurrences [
walltime=X : Time in hours before program will abort search and exit [
resfile=FILE : Main SLiMProb results table (*.csv and *.occ.csv) [
resdir=PATH : Redirect individual output files to specified directory (and look for intermediates) [
buildpath=PATH : Alternative path to look for existing intermediate files [
force=T/F : Force re-running of BLAST, UPC generation and search [
SearchDB Options I
masking=T/F : Master control switch to turn off all masking if False [
dismask=T/F : Whether to mask ordered regions (see rje_disorder for options) [
consmask=T/F : Whether to use relative conservation masking [
ftmask=LIST : UniProt features to mask out [
imask=LIST : UniProt features to inversely ("inclusively") mask. (Seqs MUST have 1+ features) 
compmask=X,Y : Mask low complexity regions (same AA in X+ of Y consecutive aas) [
casemask=X : Mask Upper or Lower case [
motifmask=X : List (or file) of motifs to mask from input sequences 
metmask=T/F : Masks the N-terminal M [
posmask=LIST : Masks list of position-specific aas, where list = pos1:aas,pos2:aas [
aamask=LIST : Masks list of AAs from all sequences (reduces alphabet) 
SearchDB Options II
efilter=T/F : Whether to use evolutionary filter [
blastf=T/F : Use BLAST Complexity filter when determining relationships [
blaste=X : BLAST e-value threshold for determining relationships [
altdis=FILE : Alternative all by all distance matrix for relationships [
gablamdis=FILE : Alternative GABLAM results file [None] (!!!Experimental feature!!!)
occupc=T/F : Whether to output the UPC ID number in the occurrence output file [
### SLiMChance Options ###
maskfreq=T/F : Whether to use masked AA Frequencies (True), or (False) mask after frequency calculations [
aafreq=FILE : Use FILE to replace individual sequence AAFreqs (FILE can be sequences or aafreq) [
aadimerfreq=FILE: Use empirical dimer frequencies from FILE (fasta or *.aadimer.tdt) [
negatives=FILE : Multiply raw probabilities by under-representation in FILE [
background=FILE : Use observed support in background file for over-representation calculations [
smearfreq=T/F : Whether to "smear" AA frequencies across UPC rather than keep separate AAFreqs [
seqocc=X : Restrict to sequences with X+ occurrences (adjust for high frequency SLiMs) [
### Output Options ###
extras=X : Whether to generate additional output files (alignments etc.) [
- 0 = No output beyond main results file
- 1 = Saved masked input sequences [*.masked.fas]
- 2 = Generate additional outputs (alignments etc.)
pickle=T/F : Whether to save/use pickles [
targz=T/F : Whether to tar and zip dataset result files (UNIX only) [
savespace=0 : Delete "unneccessary" files following run (best used with targz): [
- 0 = Delete no files
- 1 = Delete all bar *.upc and *.pickle files
- 2 = Delete all dataset-specific files including *.upc and *.pickle (not *.tar.gz)
- See also rje_slimcalc options for occurrence-based calculations and filtering *
History Module Version History
# 1.0 - SLiMProb 1.0 based on SLiMSearch 1.7. Altered output files to be *.csv and *.occ.csv.
# 1.1 - Tidied import commands.
# 1.2 - Increased extras=X levels. Adjusted maxsize=X assessment to be post-masking.
# 1.3 - Consolidating output file naming for consistency across SLiMSuite. (SLiMBuild = Motif input)
# 1.4 - Preparation for SLiMProb V2.0 & SLiMCore V2.0 using newer RJE_Object.