Function
CompariMotif is a piece of software with a single objective: to take two lists of regular expression protein motifs
(typically SLiMs) and compare them to each other, identifying which motifs have some degree of overlap, and
identifying the relationships between those motifs. It can be used to compare a list of motifs with themselves, their
reversed selves, or a list of previously published motifs, for example (e.g. ELM (http://elm.eu.org/)). CompariMotif
outputs a table of all pairs of matching motifs, along with their degree of similarity (information content) and
their relationship to each other.
The best match is used to define the relationship between the two motifs. These relationships are comprised of the
following keywords:
Match type keywords identify the type of relationship seen:
Exact
= all the matches in the two motifs are precise
Variant
= focal motif contains only exact matches and subvariants of degenerate positions in the other motif
Degenerate
= the focal motif contains only exact matches and degenerate versions of positions in the other motif
Complex
= some positions in the focal motif are degenerate versions of positions in the compared motif, while others are subvariants of degenerate positions.
Ugly
= the two motifs match at partially (but not wholly) overlapping ambiguous positions (e.g. [AGS] vs [ST]). Such matches can be excluded using overlaps=F
. (Version 3.8 onwards only.)
Match length keywords identify the length relationships of the two motifs:
Match
= both motifs are the same length and match across their entire length
Parent
= the focal motif is longer and entirely contains the compared motif
Subsequence
= the focal motif is shorter and entirely contained within the compared motif
Overlap
= neither motif is entirely contained within the other
This gives twenty possible classifications for each motif's relationship to the compared motif.
Output
The main output for CompariMotif is delimited text file containing the following fields:
File1
= Name of motifs file (if outstyle=multi
)
File2
= Name of searchdb file (if outstyle=multi
)
Name1
= Name of motif from motif file 1
Name2
= Name of motif from motif file 2
Motif1
= Motif (pattern) from motif file 1
Motif2
= Motif (pattern) from motif file 2
Sim1
= Description of motif1's relationship to motif2
Sim2
= Description of motif2's relationship to motif1
Match
= Text summary of matched region
MatchPos
= Number of matched positions between motif1 and motif2 (>= mishare=X
)
MatchIC
= Information content of matched positions
NormIC
= MatchIC as a proportion of the maximum possible MatchIC (e.g. the lowest IC motif)
CoreIC
= MatchIC as a proportion of the maximum possible IC in the matched region only.
Score
= Heuristic score (MatchPos x NormIC) for ranking motif matches
Info1
= Ambiguity score of motif1
Info2
= Ambiguity score of motif2
Desc1
= Description of motif1 (if motdesc = 1 or 3)
Desc2
= Description of motif2 (if motdesc = 2 or 3)
With the exception of the file names, which are only output if outstyle=multi
, the above is the output for the
default "normal" output style. If outstyle=single
then only statistics for motif2 (the searchdb motif) are output
as this is designed for searches using a single motif against a motif database. If outstyle=normalsplit
or
outstyle=multisplit
then motif1 information is grouped together, followed by motif2 information, followed by the
match statistics. More information can be found in the CompariMotif manual.
Webserver
CompariMotif can be run online at http://bioware.ucd.ie.
Commandline
Basic Input Parameters
motifs=FILE
: File of input motifs/peptides [None
]
searchdb=FILE
: (Optional) second motif file to compare. Will compare to self if none given. [None
]
dna=T/F
: Whether motifs should be considered as DNA motifs [False
]
Basic Output Parameters
resfile=FILE
: Name of results file, FILE.compare.tdt. [motifsFILE-searchdbFILE.compare.tdt
]
motinfo=FILE
: Filename for output of motif summary table (if desired) [None
]
motific=T/F
: Output Information Content for motifs [False
]
coreic=T/F
: Whether to output normalised Core IC [True
]
unmatched=T/F
: Whether to output lists of unmatched motifs (not from searchdb) into *.unmatched.txt [False
]
Motif Comparison Parameters
minshare=X
: Min. number of non-wildcard positions for motifs to share [2
]
normcut=X
: Min. normalised MatchIC for motif match [0.5
]
matchfix=X
: If >0 must exactly match *all* fixed positions in the motifs from: [0
]
- 1: input (motifs=FILE
) motifs
- 2: searchdb motifs
- 3: *both* input and searchdb motifs
ambcut=X
: Max number of choices in ambiguous position before replaced with wildcard (0=use
all) [10
]
overlaps=T/F
: Whether to include overlapping ambiguities (e.g. [KR] vs [HK]) as match [True
]
memsaver=T/F
: Run in more efficient memory saver mode. XGMML output not available. [False
]
Advanced Motif Input Parameters
minic=X
: Min information content for a motif (1 fixed position = 1.0) [2.0
]
minfix=X
: Min number of fixed positions for a motif to contain [0
]
minpep=X
: Min number of defined positions in a motif [2
]
trimx=T/F
: Trims Xs from the ends of a motif [False
]
nrmotif=T/F
: Whether to remove redundancy in input motifs [False
]
reverse=T/F
: Reverse the input motifs. [False
]
- If no searchdb given, these will be searched against the "forward" motifs.
mismatches=X
: <= X mismatches of positions can be tolerated [0
]
aafreq=FILE
: Use FILE to replace uniform AAFreqs (FILE can be sequences or aafreq) [None]
Advanced Motif Output Parameters
xgmml=T/F
: Whether to output XGMML format results [True
]
xgformat=T/F
: Whether to use default CompariMotif formatting or leave blank for e.g. Cytoscape [True
]
pickle=T/F
: Whether to load/save pickle following motif loading/filtering [False
]
motdesc=X
: Sets which motifs have description outputs (0-3 as matchfix option) [3
]
outstyle=X
: Sets the output style for the resfile [normal
]
- normal = all standard stats are output
- multi = designed for multiple appended runs. File names are also output
- single = designed for searches of a single motif vs a database. Only motif2 stats are output
- reduced = motifs do not have names or descriptions
- normalsplit/multisplit = as normal/multi but stats are grouped by motif rather than by type
History Module Version History
# 0.0 - Initial Compilation.
# 1.0 - Full working version with menu
# 1.1 - Added extra output options
# 2.0 - Reworked for functionality with MotifList instead of PRESTO and using own methods.
# 2.1 - Minor bug fixing and tidying. Removed matchic=F option. Added score and normcut=X.
# 3.0 - Replaced rje_motif* modules with rje_slim* modules and improved handling of termini.
# 3.1 - Added XGMML output.
# 3.2 - Added mismatches=X option. (NB. mismatch=X is used in SLiMSearch.)
# 3.3 - Added "Match" column, summarising matches
# 3.4 - Added a DNA option and AA frequencies.
# 3.5 - Miscellaneous modifications lost in the midst of time!
# 3.6 - Slightly re-worked SLiM splitting with rje_slimlist V0.8 and capability to use ELM download.
# 3.7 - Added coreIC and output of unmatched motifs.
# 3.8 - Added overlaps=T/F : Whether to include overlapping ambiguities (e.g. [KR] vs [HK]) as match [True]
# 3.8 - Changed scoring of overlapping ambiguities - uses IC of all possible ambiguities. Added "Ugly" match type.
# 3.9 - Added xgformat=T/F : Whether to use default CompariMotif formatting or leave blank for e.g. Cytoscape [True]
# 3.10- Added MemSaver option, which will read and process input motifs (not searchdb) one motif at a time.
# 3.10- Added forking.
# 3.11- Added additional overlap/matchfix checks during basic comparison to try and speed up.
# 3.12- Replaced deprecated sets.Set() with set().
# 3.13.0 - Added REST server function.
# 3.14.0 - Modified memsaver mode to take different input formats.
# 3.14.1 - Fixed forking memsaver mode to take (Q)SLiMFinder input format.
CompariMotif REST Output formats
Run with &rest=help
for general options. Run with &rest=full
to get full server output as text or &rest=format
for more user-friendly formatted output. Individual outputs can be identified/parsed using &rest=OUTFMT
.