|
SLiMFinder V4.9Short Linear Motif Finder
Copyright © 2007 Richard J. Edwards - See source code for GNU License Notice Imported modules:
See SLiMSuite Blog for further documentation. See FunctionShort linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as two sites may be important for activity, making identification of novel SLiMs extremely difficult. In particular, it can be very difficult to distinguish a randomly recurring "motif" from a truly over-represented one. Incorporating ambiguous amino acid positions and/or variable-length wildcard spacers between defined residues further complicates the matter. SLiMFinder is an integrated SLiM discovery program building on the principles of the SLiMDisc software for accounting for evolutionary relationships [Davey NE, Shields DC & Edwards RJ (2006): Nucleic Acids Res. 34(12):3546-54]. SLiMFinder is comprised of two algorithms: SLiMBuild identifies convergently evolved, short motifs in a dataset. Motifs with fixed amino acid positions are
identified and then combined to incorporate amino acid ambiguity and variable-length wildcard spacers. Unlike
programs such as TEIRESIAS, which return all shared patterns, SLiMBuild accelerates the process and reduces returned
motifs by explicitly screening out motifs that do not occur in enough unrelated proteins. For this, SLiMBuild uses
the "Unrelated Proteins" (UP) algorithm of SLiMDisc in which BLAST is used to identify pairwise relationships.
Proteins are then clustered according to these relationships into "Unrelated Protein Clusters" (UPCs), which are
defined such that no protein in a UPC has a BLAST-detectable relationship with a protein in another UPC. If desired,
SLiMBuild can be used as a replacement for TEIRESIAS in other software ( SLiMChance estimates the probability of these motifs arising by chance, correcting for the size and composition of
the dataset, and assigns a significance value to each motif. Motif occurrence probabilites are calculated
independently for each UPC, adjusted the size of a UPC using the Minimum Spanning Tree algorithm from SLiMDisc. These
individual occurrence probabilities are then converted into the total probability of the seeing the observed motifs
the observed number of (unrelated) times. These probabilities assume that the motif is known before the search. In
reality, only over-represented motifs from the dataset are looked at, so these probabilities are adjusted for the
size of motif-space searched to give a significance value. This is an estimate of the probability of seeing that
motif, or another one like it. These values are calculated separately for each length of motif. Where pre-known
motifs are also of interest, these can be given with the Where significant motifs are returned, SLiMFinder will group them into Motif "Clouds", which consist of physically overlapping motifs (2+ non-wildcard positions are the same in the same sequence). This provides an easy indication of which motifs may actually be variants of a larger SLiM and should therefore be considered together. Additional Motif Occurrence Statistics, such as motif conservation, are handled by the rje_slimlist module. Please see the documentation for this module for a full list of commandline options. These options are currently under development for SLiMFinder and are not fully supported. See the SLiMFinder Manual for further details. Note that the OccFilter *does* affect the motifs returned by SLiMBuild and thus the TEIRESIAS output (as does min. IC and min. Support) but the overall Motif StatFilter *only* affects SLiMFinder output following SLiMChance calculations. Secondary FunctionsThe "MotifSeq" option will output fasta files for a list of X:Y, where X is a motif pattern and Y is the output file. The "Randomise" function will take a set of input datasets (as in Batch Mode) and regenerate a set of new datasets Basic Input/Output Options
SLiMBuild OptionsSLiMBuild Options I
SLiMBuild Options II
SLiMBuild Options III
SLiMBuild Options IV
SLiMBuild Options V
SLiMChance Options
Advanced Output OptionsAdvanced Output Options I
Advanced Output Options II
Advanced Output Options III
Additional FunctionsAdditional Functions I
Additional Functions II
History Module Version History# 0.0 - Initial Compilation. # 1.0 - Preliminary working version with Poisson probabilities # 1.1 - Binomial probabilities, bonferroni corrections and complexity masking # 1.2 - Added musthave=LIST option and denferroni correction. # 1.3 - Added resfile=FILE output # 1.4 - Added option for termini # 1.5 - Reworked slim mechanics to be ai-x-aj strings for future ambiguity (split on '-' to make list) # 1.6 - Added basic ambiguity and flexible wildcards plus MST weighting for UP clusters # 1.7 - Added counting of generic dimer frequencies for improved Bonferroni and probability calculation (No blockmask.) # - Added topranks=X and query=X # 1.8 - Added *.upc rather than *.self.blast. Added basic randomiser function. # 1.9 - Added MotifList object to handle extra calculations and occurrence filtering. # 2.0 - Tidied up and standardised output. Implemented extra filtering and scoring options. # 2.1 - Changed defaults. Removed poisson as option and other obseleted functions. # 2.2 - Tidied and reorganised code using SLiMBuild/SLiMChance subdivision of labour. Removed rerun=T/F (just Force.) # 2.3 - Added AAFreq "smear" and "better" p1+ calculation. Added extra cloud summary output. # 2.4 - Minor bug fixes and tidying. Removed power output. (Rubbish anyway!) Can read UPC from distance matrix. # 3.0 - Dumped useless stats and calculations. Simplified output. Improved ambiguity & clouds. # 3.1 - Added minwild and alphahelix options. (Partial aadimerfreq & negatives) # 3.2 - Tidied up with SLiMCore, replaced old Motif objects with SLiM objects and SLiMCalc. # 3.3 - Added XGMML output. Added webserver option with additional output. # 3.4 - Added consmask relative conservation masking. # 3.5 - Standardised masking options. Add motifmask and motifcull. # 3.6 - Added aamasking and alphabet. # 3.7 - Added option to switch off dimfreq and better handling of given aafreq # 3.8 - Added SLiMDisc & SLiMPickings scores and options to rank on them. # 3.9 - Added clouding consensus information. [Aborted due to technical challenges.] # 3.10- Added differentiation of methods for pickling and tarring. # 4.0 - Added SigPrime and SigV calculation from Norman. Added graded extras output. # 4.1 - Added SizeSort, AltUPC and NewUPC options. Added #END output for webserver. # 4.2 - Added fixlen option and improved Alphahelix option # 4.3 - Updated the output for Max/Min filtering and the pickup options. Removed TempMaxSetting. # 4.4 - Modified to work with GOPHER V3.0. # 4.5 - Minor modifications to fix sigV and sigPrime bugs. Modified extras setting. Added palindrome setting for DNA motifs. # 4.6 - Minor modification to seqocc=T function. !Experimental! Added main occurrence output and modified savespace. # 4.7 - Added SLiMMaker generation to motif clouds. Added Q and Occ to Chance column. # 4.8 - Modified cloud generation to avoid issues with flexible-length wildcards. # 4.9 - Preparation for SLiMFinder V5.0 & SLiMCore V2.0 using newer RJE_Object. © 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au. |