Module:	SLiMMaker
Description:	SLiM generator from aligned peptide sequences
Version:	1.7.0
Last Edit:	01/02/17
Citation:	Palopoli N, Lythgow KT & Edwards RJ. Bioinformatics 2015; doi: 10.1093/bioinformatics/btv155
Webserver:	http://www.slimsuite.unsw.edu.au/servers/slimmaker.php

Imported modules: rje rje_obj rje_slim rje_zen peptcluster

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This program has a fairly simple function of reading in a set of sequences and generating a regular expression motif from them. It is designed with protein sequences in mind but should work for DNA sequences too. Input sequences can be in fasta format or just plain text (with no sequence headers) and should be aligned already. If varlength=F then gapped positions will be ignored (treated as Xs) and variable length wildcards are not returned. If varlength=T, any gapped positions will be assessed based on the ungapped peptides at that position and a variable length inserted. This variable-length position may be a wildcard or it may be a defined position if there is sufficient signal in the peptides with amino acids at that position.

SLiMMaker considers each column of the input in turn and compresses it into a regular expression element according to some simple rules, screening out rare amino acids and converting particularly degenerate positions into wildcards. Each amino acid in the column that occurs at least X times (as defined by minseq=X) is considered for the regular expression definition for that position. The full set of amino acids meeting this criterion is then assessed for whether to keep it as a defined position, or convert into a wildcard. First, if the number of different amino acids meeting this criterion is zero or above a second threshold (maxaa=X), the position is defined as a wildcard. Second, the proportion of input sequences matching the amino acid set is compared to a minimum frequency criterion (minfreq=X). Failing to meet this minimum frequency will again result in a wildcard. Otherwise, the amino acid set is added to the SLiM definition as either a fixed position (if only one amino acid met the minseq criterion) or as a degenerate position. Finally, leading and trailing wildcards are removed.

By default, each defined position in a motif will contain amino acids that (a) occur in at least three sequences each, (b) have a combined frequency of >=75%, and (c) have 5 or fewer different amino acids (that occur in 3+ sequences). The same minseq=X threshold is also used to determine whether flexible length *defined* positions are generated (if varlength=T), i.e. to have a flexible-length non-wildcard position, at least minseq sequences must have a gap at that position. This does not apply to flexible-length wildcards.

Note. Unless the "iterate" function is used, the final motif only contains defined positions that match a given frequency of the input (75% by default). Because positions are considered independently, however, the final motif might occur in fewer than 75% of the input sequences. SLiMSearch can be used to check the occurrence stats.

Version 1.5.0 incorporates a new peptide alignment mode to deal with unaligned peptides. This is controlled by the peptalign=T/F/X option, which is set to True by default. If given a regular expression, this will be used to guide the alignment. Otherwise, the longest peptides will be used as a guide and the minimum number of gaps added to shorter peptides. PeptCluster peptide distance measures are used to assess different variants, starting with simple sequence identity, then amino acid properties (if ties) and finally PAM distances. One of the latter can be set as the priority using peptdis=X. Peptide alignment assumes that peptides have termini (^ & $) or flanking wildcards added. If not, set termini=F.

Version 1.6.0 added the option to incorporate amino acid equivalencies to extend motif sites beyond the top X% of amino acids. This works by identifying a degenerate set of amino acids as normal using minseq=X and then checking whether these form a subset of an equivalence group prior to the minfreq=X filter. If so, it will try extending the degenerate position to incorporate additional members of the equivalence group. For example, IL could incorporate additional MVF amino acids of an FILMV group. Only amino acids represented in the peptides will be added. Single amino acids will also be extended, e.g. S could be extended to ST. This mode is switched on with extendaa=T. The equiv=LIST option sets the equivalence groups.

If two or more equivalence groups could be extended, the one with the most members will be chosen. If tied, the one with fewest possible amino acids (from equiv=LIST) will be chosen. If still tied, the first group in the list will take precedence.

Commandline

SLiMMaker Options

peptides=LIST : These can be entered as a list or a file. If a file, lines following '#' or '>' are ignored
maxlen=INT : Maximum length for peptide [50]
peptalign=T/F/X : Align peptides. Will use as guide regular expression, else T/True for regex-free alignment. [True]
minseq=X : Min. no. of sequences for an aa to be in [3]
minfreq=X : Min. combined freq of accepted aa to avoid wildcard [0.75]
maxaa=X : Max. no. different amino acids for one position [5]
ignore=X : Amino acid(s) to ignore. (If nucleotide, would be N-) ['X-']
dna=T/F : Whether "peptides" are actually DNA fragments [False]
iterate=T/F : Whether to perform iterative SLiMMaker, re-running on matched peptides with each iteration [False]
varlength=T/F : Whether to identifies gaps in aligned peptides and generate variable length motif [True]
extendaa=T/F : Whether to extend ambiguous aa using equivalence list [False]
equiv=LIST : List (or file) of TEIRESIAS-style ambiguities to use [AGS,ILMVF,FYW,KRH,DE,ST]

See also rje.py generic commandline options.

History Module Version History

    # 0.0 - Initial Compilation.
    # 1.0 - Initial Working Version. Some minor modifications for SLiMBench including iterative SLiMMaker.
    # 1.1 - Modified to work with end of line characters.
    # 1.2.0 - Modified to work with REST servers.
    # 1.3.0 - Added varlength option to identify gaps in aligned peptides and generate variable length motif.
    # 1.3.1 - Fixed varlength option to work with end of peptide gaps. (Gaps ignored completely - should not be there!)
    # 1.4.0 - Add iteration REST output.
    # 1.4.1 - Add unmatched peptides REST output.
    # 1.4.2 - Fixed bug with variable length wildcards at start of sequence.
    # 1.5.0 - Added peptalign=X functionality, using PeptCluster peptide alignment.
    # 1.6.0 - Added equiv=LIST : List (or file) of TEIRESIAS-style ambiguities to use [AGS,ILMVF,FYW,FYH,KRH,DE,ST]
    # 1.6.1 - Fixed peptide case bug.
    # 1.7.0 - Added maxlen parameter.

SLiMMaker REST Output formats

Running SLiMMaker

Run with &rest=help for general options. Run with &rest=full to get full server output as text or &rest=format
for more user-friendly formatted output. Individual outputs can be identified/parsed using &rest=OUTFMT for:

Available REST Outputs

slim = Short Linear Motif pattern returned
match = Number of input peptides matched by the SLiM
peptides = Original input peptides
aligned = aligned peptides
matches = Peptides matching the SLiM
unmatched = Peptides not matching the SLiM
iterate = SLiM/Peptide iterations. [&iterate=T]

Enter sequences and click "Make SLiM". Sequences can be raw sequences or fasta format.

(Example sequences are LIG_PCNA_PIPBox_1 ELM occurrences.)

QGTLESFFKR QKRINEFFPR QGRLDDFFKV QSSLLSFFSK QPTISRFFKK QGRLDGFFQV QRSIMSFFHP QTTITSHFAK QKTLYSFFSP QKSIMSFFGK QATLARFFTS QQTLSSFFMG QTTIEDFFGT QVSITGFFQR QTSMTDFYHS QLRIDSFFRL QSTLYSFFPK QWKLLRDFDI	SLiMMaker options:
	Min. no. of sequences for an aa to be in:
	Min. combined freq of accepted aa to avoid wildcard:
	Max. no. different amino acids for one position:
	"Amino acid(s)" to ignore. (If nucleotide, would be N-):
	Whether "peptides" are actually DNA fragments
	Perform iterative SLiMMaker.
	Align peptides: Will use as guide regular expression, else `T/True` for regex-free alignment or `F/False` for None.
	Whether to identifies gaps in aligned peptides and generate variable length motif.
	Whether to extend ambiguous aa using equivalence list.
	List of amino acid equivalence groups to use:

In place of peptides, an ELM Class can also be entered into the box. See the REST aliases page for more details.

SLiMSuite REST Server

SLiMMaker V1.7.0

SLiM generator from aligned peptide sequences