Program:	SLiMSearch
Description:	Short Linear Motif Search tool
Version:	1.7.1
Last Edit:	03/12/15
Citation:	Davey, Haslam, Shields & Edwards (2010), Lecture Notes in Bioinformatics 6282: 50-61.

Imported modules: rje rje_seq rje_sequence rje_scoring rje_slim rje_slimcore rje_slimcalc rje_slimlist rje_zen

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

SLiMSearch is a tool for finding pre-defined SLiMs (Short Linear Motifs) in a protein sequence database. SLiMSearch can make use of corrections for evolutionary relationships and a variation of the SLiMChance alogrithm from SLiMFinder to assess motifs for statistical over- and under-representation. SLiMSearch is a replacement for PRESTO and uses many of the same underlying modules.

Benefits of SLiMSearch that make it more useful than a lot of existing tools include:

searching with mismatches rather than restricting hits to perfect matches.
optional equivalency files for searching with specific allowed mismatched (e.g. charge conservation)
generation or reading of alignment files from which to calculate conservation statistics for motif occurrences.
additional statistics, including protein disorder, surface accessibility and hydrophobicity predictions
recognition of "n of m" motif elements in the form <X:n:m>, where X is one or more amino acids that must occur n+

Main output for SLiMSearch is a delimited file of motif/peptide occurrences but the motifaln=T and proteinaln=T also allow output of alignments of motifs and their occurrences. The primary outputs are named *.csv for the occurrence data and *.summary.csv for the summary data for each motif/dataset pair.

NOTE: SLiMSearch has now been largely superseded by SLiMProb for motif statistics.

Commandline

### Basic Input/Output Options ###
motifs=FILE : File of input motifs/peptides [None]
Single line per motif format = 'Name Sequence #Comments' (Comments are optional and ignored)
Alternative formats include fasta, SLiMDisc output and raw motif lists.
seqin=FILE : Sequence file to search [None]
batch=LIST : List of sequence files for batch input (wildcard * permitted) []
maxseq=X : Maximum number of sequences to process [0]
maxsize=X : Maximum dataset size to process in AA (or NT) [100,000]
maxocc=X : Filter out Motifs with more than maximum number of occurrences [0]
walltime=X : Time in hours before program will abort search and exit [1.0]
resfile=FILE : Main SLiMSearch results table [slimsearch.csv]
resdir=PATH : Redirect individual output files to specified directory (and look for intermediates) [SLiMSearch/]
buildpath=PATH : Alternative path to look for existing intermediate files [SLiMSearch/]
force=T/F : Force re-running of BLAST, UPC generation and search [False]
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

SearchDB Options I

masking=T/F : Master control switch to turn off all masking if False [False]
dismask=T/F : Whether to mask ordered regions (see rje_disorder for options) [False]
consmask=T/F : Whether to use relative conservation masking [False]
ftmask=LIST : UniProt features to mask out [EM,DOMAIN,TRANSMEM]
imask=LIST : UniProt features to inversely ("inclusively") mask. (Seqs MUST have 1+ features) []
compmask=X,Y : Mask low complexity regions (same AA in X+ of Y consecutive aas) [5,8]
casemask=X : Mask Upper or Lower case [None]
motifmask=X : List (or file) of motifs to mask from input sequences []
metmask=T/F : Masks the N-terminal M [False]
posmask=LIST : Masks list of position-specific aas, where list = pos1:aas,pos2:aas [2:A]
aamask=LIST : Masks list of AAs from all sequences (reduces alphabet) []

SearchDB Options II

efilter=T/F : Whether to use evolutionary filter [False]
blastf=T/F : Use BLAST Complexity filter when determining relationships [True]
blaste=X : BLAST e-value threshold for determining relationships [1e=4]
altdis=FILE : Alternative all by all distance matrix for relationships [None]
gablamdis=FILE : Alternative GABLAM results file [None] (!!!Experimental feature!!!)
occupc=T/F : Whether to output the UPC ID number in the occurrence output file [False]
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
### SLiMChance Options ###
maskfreq=T/F : Whether to use masked AA Frequencies (True), or (False) mask after frequency calculations [True]
aafreq=FILE : Use FILE to replace individual sequence AAFreqs (FILE can be sequences or aafreq) [None]
aadimerfreq=FILE: Use empirical dimer frequencies from FILE (fasta or *.aadimer.tdt) [None]
negatives=FILE : Multiply raw probabilities by under-representation in FILE [None]
background=FILE : Use observed support in background file for over-representation calculations [None]
smearfreq=T/F : Whether to "smear" AA frequencies across UPC rather than keep separate AAFreqs [False]
seqocc=X : Restrict to sequences with X+ occurrences (adjust for high frequency SLiMs) [1]
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
### Output Options ###
extras=X : Whether to generate additional output files (alignments etc.) [1]
- 0 = No output beyond main results file
- 1 = Generate additional outputs (alignments etc.)
pickle=T/F : Whether to save/use pickles [True]
targz=T/F : Whether to tar and zip dataset result files (UNIX only) [False]
savespace=0 : Delete "unneccessary" files following run (best used with targz): [0]
- 0 = Delete no files
- 1 = Delete all bar *.upc and *.pickle files
- 2 = Delete all dataset-specific files including *.upc and *.pickle (not *.tar.gz)

See also rje_slimcalc options for occurrence-based calculations and filtering *

History Module Version History

    # 0.0 - Initial Compilation.
    # 1.0 - Standardised masking options. Still not fully tested.
    # 1.1 - Added background=FILE option for determing mean(p1+) for SLiMs based on background file.
    # 1.2 - Added maxsize option.
    # 1.3 - Add aamask option (and alphabet)
    # 1.4 - Fixed zero-size UPC bug.
    # 1.5 - Add MaxOcc setting.
    # 1.6 - Minor tweaks to Log output. Add option for UPC number in occ output.
    # 1.7 - Modified to work with GOPHER V3.0.
    # 1.7.1 - Minor modification to docstring. Preparation for update to SLiMSearch 2.0 optimised for proteome searches.

SLiMSuite REST Server

SLiMSearch V1.7.1

Short Linear Motif Search tool

Function

Commandline

SearchDB Options I

SearchDB Options II

History Module Version History