Program:	QSLiMFinder
Description:	Query Short Linear Motif Finder
Version:	1.9
Last Edit:	11/07/14
Citation:	Edwards, Davey & Shields (2007), PLoS ONE 2(10): e967.

Imported modules: rje rje_seq slimfinder rje_slim rje_slimcalc rje_slimlist

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

QSLiMFinder is a modification of the basic SLiMFinder tool to specifically look for SLiMs shared by a query sequence and one or more additional sequences. To do this, SLiMBuild first identifies all motifs that are present in the query sequences before removing it (and its UPC) from the dataset. The rest of the search and stats takes place using the remainder of the dataset but only using motifs found in the query. The final correction for multiple testing is made using a motif space defined by the original query sequence, rather than the full potential motif space used by the original SLiMFinder. This is offset against the increased probability of the observed motif support values due to the reduction of support that results from removing the query sequence but could potentially still identify SLiMs will increased significance.

Note that minocc and ambocc values *include* the query sequence, e.g. minocc=2 specifies the query and ONE other UPC.

Commandline

### Basic Input/Output Options ###
seqin=FILE batch=LIST query=LIST addquery=FILE maxseq=X maxupc=X sizesort=X walltime=X resfile=FILE resdir=PATH buildpath=PATH force=T/F pickup=T/F dna=T/F #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SLiMBuild Options I

efilter=T/F blastf=T/F blaste=X altdis=FILE gablamdis=FILE homcut=X

SLiMBuild Options II

masking=T/F dismask=T/F consmask=T/F ftmask=LIST imask=LIST compmask=X,Y casemask=X motifmask=X metmask=T/F posmask=LIST aamask=LIST qregion=X,Y : Sequence file to search [None]
: List of files to search, wildcards allowed. (Over-ruled by seqin=FILE.) [*.dat,*.fas]
: Return only SLiMs that occur in 1+ Query sequences (Name/AccNum/Seq Number) [1]
: Adds query sequence(s) to batch jobs from FILE [None]
: Maximum number of sequences to process [500]
: Maximum UPC size of dataset to process [0]
: Sorts batch files by size prior to running (+1 small->big; -1 big->small; 0 none) [0]
: Time in hours before program will abort search and exit [1.0]
: Main QSLiMFinder results table [qslimfinder.csv]
: Redirect individual output files to specified directory (and look for intermediates) [QSLiMFinder/]
: Alternative path to look for existing intermediate files [SLiMFinder/]
: Force re-running of BLAST, UPC generation and SLiMBuild [False]
: Pick-up from aborted batch run by identifying datasets in resfile using RunID [False]
: Whether the sequences files are DNA rather than protein [False]
~~~~~~~~~~~~~~~~~~~~~~~#
: Whether to use evolutionary filter [True]
: Use BLAST Complexity filter when determining relationships [True]
: BLAST e-value threshold for determining relationships [1e=4]
: Alternative all by all distance matrix for relationships [None]
: Alternative GABLAM results file [None] (!!!Experimental feature!!!)
: Max number of homologues to allow (to reduce large multi-domain families) [0]
: Master control switch to turn off all masking if False [True]
: Whether to mask ordered regions (see rje_disorder for options) [False]
: Whether to use relative conservation masking [False]
: UniProt features to mask out [EM]
: UniProt features to inversely ("inclusively") mask. (Seqs MUST have 1+ features) []
: Mask low complexity regions (same AA in X+ of Y consecutive aas) [5,8]
: Mask Upper or Lower case [None]
: List (or file) of motifs to mask from input sequences []
: Masks the N-terminal M (can be useful if termini=T) [True]
: Masks list of position-specific aas, where list = pos1:aas,pos2:aas [2:A]
: Masks list of AAs from all sequences (reduces alphabet) []
: Mask all but the region of the query from (and including) residue X to residue Y [0,-1]

SLiMBuild Options III

termini=T/F : Whether to add termini characters (^ & $) to search sequences [True]
minwild=X : Minimum number of consecutive wildcard positions to allow [0]
maxwild=X : Maximum number of consecutive wildcard positions to allow [2]
slimlen=X : Maximum length of SLiMs to return (no. non-wildcard positions) [5]
minocc=X : Minimum number of unrelated occurrences for returned SLiMs. (Proportion of UP if < 1) [0.05]
absmin=X : Used if minocc<1 to define absolute min. UP occ [3]
alphahelix=T/F : Special i, i+3/4, i+7 motif discovery [False]

SLiMBuild Options IV

ambiguity=T/F : (preamb=T/F) Whether to search for ambiguous motifs during motif discovery [True]
ambocc=X : Min. UP occurrence for subvariants of ambiguous motifs (minocc if 0 or > minocc) [0.05]
absminamb=X : Used if ambocc<1 to define absolute min. UP occ [2]
equiv=LIST : List (or file) of TEIRESIAS-style ambiguities to use [AGS,ILMVF,FYW,FYH,KRH,DE,ST]
wildvar=T/F : Whether to allow variable length wildcards [True]
combamb=T/F : Whether to search for combined amino acid degeneracy and variable wildcards [False]

SLiMBuild Options V

musthave=LIST : Returned motifs must contain one or more of the AAs in LIST (reduces search space) []
focus=FILE : FILE containing focal groups for SLiM return (see Manual for details) [None]
focusocc=X : Motif must appear in X+ focus groups (0 = all) [0]

See also rje_slimcalc options for occurrence-based calculations and filtering *

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
### SLiMChance Options ###
cloudfix=T/F : Restrict output to clouds with 1+ fixed motif (recommended) [False]
slimchance=T/F : Execute main QSLiMFinder probability method and outputs [True]
sigprime=T/F : Calculate more precise (but more computationally intensive) statistical model [False]
sigv=T/F : Use the more precise (but more computationally intensive) fix to mean UPC probability [False]
qexact=T/F : Calculate exact Query motif space (True) or over-estimate from dimers (False) (quicker) [True]
probcut=X : Probability cut-off for returned motifs [0.1]
maskfreq=T/F : Whether to use masked AA Frequencies (True), or (False) mask after frequency calculations [False]
aafreq=FILE : Use FILE to replace individual sequence AAFreqs (FILE can be sequences or aafreq) [None]
aadimerfreq=FILE: Use empirical dimer frequencies from FILE (fasta or *.aadimer.tdt) (!!!Experimental!!!) [None]
negatives=FILE : Multiply raw probabilities by under-representation in FILE (!!!Experimental!!!) [None]
smearfreq=T/F : Whether to "smear" AA frequencies across UPC rather than keep separate AAFreqs [False]
seqocc=T/F : Whether to upweight for multiple occurrences in same sequence (heuristic) [False]
probscore=X : Score to be used for probability cut-off and ranking (Prob/Sig) [Sig]
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

Advanced Output Options I

clouds=X : Identifies motif "clouds" which overlap at 2+ positions in X+ sequences (0=minocc / -1=off) [2]
runid=X : Run ID for resfile (allows multiple runs on same data) [DATE:TIME]
logmask=T/F : Whether to log the masking of individual sequences [True]
slimcheck=FILE : Motif file/list to add to resfile output []

Advanced Output Options II

teiresias=T/F : Replace TEIRESIAS, making *.out and *.mask.fasta files [False]
slimdisc=T/F : Emulate SLiMDisc output format (*.rank & *.dat.rank + TEIRESIAS *.out & *.fasta) [False]
extras=X : Whether to generate additional output files (alignments etc.) [1]
--1 = No output beyond main results file
- 0 = Generate occurrence file and cloud file
- 1 = Generate occurrence file, alignments and cloud file
- 2 = Generate all additional QSLiMFinder outputs
- 3 = Generate SLiMDisc emulation too (equiv extras=2 slimdisc=T)
targz=T/F : Whether to tar and zip dataset result files (UNIX only) [False]
savespace=0 : Delete "unneccessary" files following run (best used with targz): [0]
- 0 = Delete no files
- 1 = Delete all bar *.upc and *.pickle
- 2 = Delete all bar *.upc (pickle added to tar)
- 3 = Delete all dataset-specific files including *.upc and *.pickle (not *.tar.gz)

Advanced Output Options III

topranks=X : Will only output top X motifs meeting probcut [1000]
minic=X : Minimum information content for returned motifs [2.1]
allsig=T/F : Whether to also output all SLiMChance combinations (Sig/SigV/SigPrime/SigPrimeV) [False]

See also rje_slimcalc options for occurrence-based calculations and filtering *

History Module Version History

    # 0.0 - Initial Compilation based on SLiMFinder 3.5.
    # 1.0 - Test & Modified to include AA masking.
    # 1.1 - Added sizesort.
    # 1.2 - Added the addquery function.
    # 1.3 - Updated the output for Max/Min filtering and the pickup options.
    # 1.4 - Added additional dictionary and list to store Query dimers and SLiMs for motif space calculations.
    # 1.4 - Added qexact=T/F option for calculating Exact Query motif space (True) or estimating from dimers (False).
    # 1.5 - Implemented SigV calculation. Modified extras setting.
    # 1.6 - Removed excess module imports.
    # 1.7 - Fixed "MustHave=LIST" correction of motif space.
    # 1.8 - Added cloudfix=T/F Restrict output to clouds with 1+ fixed motif (recommended) [False]. Consolidating output.
    # 1.9 - Preparation for QSLiMFinder V2.0 & SLiMCore V2.0 using newer RJE_Object.

SLiMSuite REST Server

QSLiMFinder V1.9

Query Short Linear Motif Finder