Program:	CompariMotif
Description:	Motif vs Motif Comparison Software
Version:	3.14.1
Last Edit:	12/11/19
Citation:	Edwards, Davey & Shields (2008), Bioinformatics 24(10):1307-9.
Webserver:	http://www.slimsuite.unsw.edu.au/servers/comparimotif.php
Manual:	http://bit.ly/CompariMotifManual

Imported modules: rje rje_menu rje_seq rje_slim rje_slimlist rje_xgmml rje_zen rje_forker

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

CompariMotif is a piece of software with a single objective: to take two lists of regular expression protein motifs (typically SLiMs) and compare them to each other, identifying which motifs have some degree of overlap, and identifying the relationships between those motifs. It can be used to compare a list of motifs with themselves, their reversed selves, or a list of previously published motifs, for example (e.g. ELM (http://elm.eu.org/)). CompariMotif outputs a table of all pairs of matching motifs, along with their degree of similarity (information content) and their relationship to each other.

The best match is used to define the relationship between the two motifs. These relationships are comprised of the following keywords:

Match type keywords identify the type of relationship seen:

Exact = all the matches in the two motifs are precise
Variant = focal motif contains only exact matches and subvariants of degenerate positions in the other motif
Degenerate = the focal motif contains only exact matches and degenerate versions of positions in the other motif
Complex = some positions in the focal motif are degenerate versions of positions in the compared motif, while others are subvariants of degenerate positions.
Ugly = the two motifs match at partially (but not wholly) overlapping ambiguous positions (e.g. [AGS] vs [ST]). Such matches can be excluded using overlaps=F. (Version 3.8 onwards only.)

Match length keywords identify the length relationships of the two motifs:

Match = both motifs are the same length and match across their entire length
Parent = the focal motif is longer and entirely contains the compared motif
Subsequence = the focal motif is shorter and entirely contained within the compared motif
Overlap = neither motif is entirely contained within the other

This gives twenty possible classifications for each motif's relationship to the compared motif.

Input

CompariMotif can take input in a number of formats. The preferred format is SLiMSearch format, which is a single line
motif format: 'Name Sequence #Comments' (Comments are optional and ignored). Alternative inputs include SLiMFinder and
SLiMDisc output, ELM downloads, raw lists of motifs, and fasta format. Any delimited file with 'Name' and 'Pattern'
fields should be recognised.

Complex motifs containing either/or (REGEX1|REGEX2) portions will be split into multiple motifs (marked a, b etc.).
Similarly, variable numbers of non-wildcard positions will be split, e.g. RK{0,1}R would become RR and RKR. "3of5"
motif patterns, formatted <R:m:n> where at least m of a stretch of n residues must match R, are also split prior to a
search being perfomed. Currently, wildcard spacers are limited to a maximum length of 9.

Output

The main output for CompariMotif is delimited text file containing the following fields:

File1 = Name of motifs file (if outstyle=multi)
File2 = Name of searchdb file (if outstyle=multi)
Name1 = Name of motif from motif file 1
Name2 = Name of motif from motif file 2
Motif1 = Motif (pattern) from motif file 1
Motif2 = Motif (pattern) from motif file 2
Sim1 = Description of motif1's relationship to motif2
Sim2 = Description of motif2's relationship to motif1
Match = Text summary of matched region
MatchPos = Number of matched positions between motif1 and motif2 (>= mishare=X)
MatchIC = Information content of matched positions
NormIC = MatchIC as a proportion of the maximum possible MatchIC (e.g. the lowest IC motif)
CoreIC = MatchIC as a proportion of the maximum possible IC in the matched region only.
Score = Heuristic score (MatchPos x NormIC) for ranking motif matches
Info1 = Ambiguity score of motif1
Info2 = Ambiguity score of motif2
Desc1 = Description of motif1 (if motdesc = 1 or 3)
Desc2 = Description of motif2 (if motdesc = 2 or 3)

With the exception of the file names, which are only output if outstyle=multi, the above is the output for the
default "normal" output style. If outstyle=single then only statistics for motif2 (the searchdb motif) are output
as this is designed for searches using a single motif against a motif database. If outstyle=normalsplit or
outstyle=multisplit then motif1 information is grouped together, followed by motif2 information, followed by the
match statistics. More information can be found in the CompariMotif manual.

Webserver

CompariMotif can be run online at http://bioware.ucd.ie.

Commandline

Basic Input Parameters

motifs=FILE searchdb=FILE dna=T/F

Basic Output Parameters

resfile=FILE motinfo=FILE motific=T/F coreic=T/F unmatched=T/F

Motif Comparison Parameters

minshare=X normcut=X matchfix=X - 1: input (ambcut=X overlaps=T/F memsaver=T/F

Advanced Motif Input Parameters

minic=X minfix=X minpep=X trimx=T/F nrmotif=T/F reverse=T/F - If no searchdb given, these will be searched against the "forward" mismatches=X aafreq=FILE : File of input motifs/peptides [None]
: (Optional) second motif file to compare. Will compare to self if none given. [None]
: Whether motifs should be considered as DNA motifs [False]
: Name of results file, FILE.compare.tdt. [motifsFILE-searchdbFILE.compare.tdt]
: Filename for output of motif summary table (if desired) [None]
: Output Information Content for motifs [False]
: Whether to output normalised Core IC [True]
: Whether to output lists of unmatched motifs (not from searchdb) into *.unmatched.txt [False]
: Min. number of non-wildcard positions for motifs to share [2]
: Min. normalised MatchIC for motif match [0.5]
: If >0 must exactly match *all* fixed positions in the motifs from: [0]
:motifs">motifs=FILE) motifs
: Max number of choices in ambiguous position before replaced with wildcard (0=use all) [10]
: Whether to include overlapping ambiguities (e.g. [KR] vs [HK]) as match [True]
: Run in more efficient memory saver mode. XGMML output not available. [False]
: Min information content for a motif (1 fixed position = 1.0) [2.0]
: Min number of fixed positions for a motif to contain [0]
: Min number of defined positions in a motif [2]
: Trims Xs from the ends of a motif [False]
: Whether to remove redundancy in input motifs [False]
: Reverse the input motifs. [False]
motifs.
: <= X mismatches of positions can be tolerated [0]
: Use FILE to replace uniform AAFreqs (FILE can be sequences or aafreq) [None]

Advanced Motif Output Parameters

xgmml=T/F : Whether to output XGMML format results [True]
xgformat=T/F : Whether to use default CompariMotif formatting or leave blank for e.g. Cytoscape [True]
pickle=T/F : Whether to load/save pickle following motif loading/filtering [False]
motdesc=X : Sets which motifs have description outputs (0-3 as matchfix option) [3]
outstyle=X : Sets the output style for the resfile [normal]
- normal = all standard stats are output
- multi = designed for multiple appended runs. File names are also output
- single = designed for searches of a single motif vs a database. Only motif2 stats are output
- reduced = motifs do not have names or descriptions
- normalsplit/multisplit = as normal/multi but stats are grouped by motif rather than by type

History Module Version History

    # 0.0 - Initial Compilation.
    # 1.0 - Full working version with menu
    # 1.1 - Added extra output options
    # 2.0 - Reworked for functionality with MotifList instead of PRESTO and using own methods.
    # 2.1 - Minor bug fixing and tidying. Removed matchic=F option. Added score and normcut=X.
    # 3.0 - Replaced rje_motif* modules with rje_slim* modules and improved handling of termini.
    # 3.1 - Added XGMML output.
    # 3.2 - Added mismatches=X option. (NB. mismatch=X is used in SLiMSearch.)
    # 3.3 - Added "Match" column, summarising matches
    # 3.4 - Added a DNA option and AA frequencies.
    # 3.5 - Miscellaneous modifications lost in the midst of time!
    # 3.6 - Slightly re-worked SLiM splitting with rje_slimlist V0.8 and capability to use ELM download.
    # 3.7 - Added coreIC and output of unmatched motifs.
    # 3.8 - Added overlaps=T/F  : Whether to include overlapping ambiguities (e.g. [KR] vs [HK]) as match [True]
    # 3.8 - Changed scoring of overlapping ambiguities - uses IC of all possible ambiguities. Added "Ugly" match type.
    # 3.9 - Added xgformat=T/F : Whether to use default CompariMotif formatting or leave blank for e.g. Cytoscape [True]
    # 3.10- Added MemSaver option, which will read and process input motifs (not searchdb) one motif at a time.
    # 3.10- Added forking.
    # 3.11- Added additional overlap/matchfix checks during basic comparison to try and speed up.
    # 3.12- Replaced deprecated sets.Set() with set().
    # 3.13.0 - Added REST server function.
    # 3.14.0 - Modified memsaver mode to take different input formats.
    # 3.14.1 - Fixed forking memsaver mode to take (Q)SLiMFinder input format.

CompariMotif REST Output formats

Run with &rest=help for general options. Run with &rest=full to get full server output as text or &rest=format
for more user-friendly formatted output. Individual outputs can be identified/parsed using &rest=OUTFMT.

SLiMSuite REST Server

CompariMotif V3.14.1

Motif vs Motif Comparison Software