SLiMBench has two primary functions:
1. Generating SLiM prediction benchmarking datasets from ELM (or other data in a similar format). This includes
options for generating random and/or simulated datasets for ROC analysis etc.
2. Assessing the results of SLiM predictions against a Benchmark. This program is designed to work with SLiMFinder
and QSLiMFinder output, so some prior results parsing may be needed for other methods.
, SLiMBench will check and optionally download the input files but perform no additional
processing or analysis.
Please see the SLiMBench manual for more details.
SOURCE DATA OPTIONS
sourcepath=PATH/ : Will look in this directory for input files if not found [
sourcedate=DATE : Source file date (YYYY-MM-DD) to preferentially use [
elmclass=FILE : Download from ELM website of ELM classes [
elminstance=FILE : Download from ELM website of ELM instances [
elminteractors=FILE : Download from ELM website of ELM interactors [
elmdomains=FILE : Download from ELM website of ELM Pfam domain interactors [
elmdat=FILE : File of downloaded UniProt entries (See rje_uniprot for more details) [
ppisource=X : Source of PPI data. (See documentation for details.) (HINT/FILE) [
ppispec=LIST : List of PPI files/species/databases to generate PPI datasets from [
ppid=X : PPI source protein identifier type (gene/uni/none; will work out from headers if None) [
randsource=FILE : Source for random/simulated dataset sequences. If species, will extract from UniProt [
randat=T/F : Whether to use DAT file for random source [
occsource=X/FILE : Source for OccBench datasets (ELM, RandSource, or file). (ELM/RAND/FILE) [
occspec=LIST : Restrict OccBench analysis to given species; blank for all (useful for
download=T/F : Whether to download files directly from websites where possible if missing [
integrity=T/F : Whether to quit by default if source data integrity is breached [
unipath=PATH : Path to UniProt download. Will query website if "URL" [
GENERAL/ELM BENCHMARK GENERATION OPTIONS
genpath=PATH : Output path for datasets generated with SLiMBench file generator [
generate=T/F : Whether to generate SLiMBench datasets from ELM input. [
genspec=LIST : Restrict ELM/OccBench datasets to listed species (restricts ELM instances) 
slimmaker=T/F : Whether to use SLiMMaker to "reduce" ELMs to more findable SLiMs [
minupc=X : Minimum number of UPC for benchmark dataset [
maxseq=X : Maximum number of sequences for benchmark datasets [
minic=X : Min information content for a motif (1 fixed position = 1.0) [
2.0; 1.1 for OccBench]
filterdir=X : Directory suffix for filtered benchmarking datasets [
queries=T/F : Whether to generate datasets with specific Query proteins [
flankmask=LIST : List of flanking mask options (used with queries and simbench) [
elmbench=T/F : Whether to generate ELM datasets [
ppibench=T/F : Whether to generate ELM PPI datasets [
domlink=T/F : Link ELMs to PPI via Pfam domains (True) or (False) just use direct protein links [
itype=X : Interaction identifer for PPI datasets [
first element of ppisource]
dombench=T/F : Whether to generate Pfam domain ELM PPI datasets [
occbench=T/F : Whether to generate ELM OccBench datasets [
RANDOM/SIMULATION BENCHMARK GENERATION OPTIONS
simbench=T/F : Whether to generate simulated datasets using reduced ELMs (if found) [
ranbench=T/F : Whether to generate randomised datasets (part of simulation if
randreps=X : Number of replicates for each random (or simulated) datasets [
simcount=LIST : Number of "TPs" to have in dataset [
simratios=LIST : List of simulated ELM:Random ratios [
randir=PATH : Output path for creation of randomised datasets [
randbase=X : Base for random dataset name if
masking=T/F : Whether to use SLiMCore masking for query selection [
searchini=FILE : INI file containing SLiMProb search options that restrict returned positives 
maxseq=X : Maximum number of randsource sequences for SLiM to hit (also maxaa and maxupc limits) [
BENCHMARK ASSESSMENT OPTIONS
benchmark=T/F : Whether to perfrom SLiMBench benchmarking assessment against motif file [
datatype=X : Type of data to be generated and/or benchmarked (occ/elm/ppi/dom/sim/simonly) [
queries=T/F : Whether to datasets have specific Query proteins [
resfiles=LIST : List of (Q)SLiMFinder results files to use for benchmarking [
balanced=T/F : Whether to reduce benchmarking to datasets found for all RunIDs [
compdb=FILE : Motif file to be used for benchmarking [elmclass file] (reduced unless occ/ppi)
occbenchpos=FILE : File of all positive occurrences for OccBench [
bymotif=T/F : Whether to weight output by motif for OccBench (others always weighted) [
benchbase=X : Basefile for SLiMBench benchmarking output [
runid=LIST : List of factors to split RunID column into (on '.') [
dataset=LIST : List of headers to split dataset into (on '.'). If blank, will use datatype defaults. 
bycloud=X : Whether to compress results into clouds prior to assessment (True/False/Both) [
sigcut=LIST : Significance thresholds to use for assessment [
iccut=LIST : Minimum IC for (Q)SLiMFinder results for elm/sim/ppi benchmark assessment [
slimlencut=LIST : List of individual SLiM lengths to return results for (
noamb=T/F : Filter out ambiguous patterns [
assfilter=LIST : List of motifs to filter out from assessment datasets post-rating (still count as OT) 
minocctp=INT : Min number of occurence TP for OccBench motifs [
force=T/F : Whether to force regeneration of outputs (True) or assume existing outputs are right [
backups=T/F : Whether to (prompt if interactive and) generate backups before overwriting files [
See also rje.py generic commandline options.
History Module Version History
# 0.0 - Initial Compilation.
# 0.1 - Functional version with benchmarking dataset generation.
# 1.0 - Consolidation of "working" version with additional basic benchmarking analysis.
# 1.1 - Added simulated dataset construction and benchmarking.
# 1.2 - Added MinIC filtering to benchmark assessment. Sorted beginning/end of line for reduced ELMs.
# 1.3 - Made SimCount a list rather than Integer. Sorted CompariMotif assessment issue.
# 1.4 - Added ICCut and SLiMLenCut as lists and output columns.
# 1.5 - Added Summary Results output table. Removed PropRes.
# 1.6 - Added "simonly" to datatype - calculates both SN and FPR from "sim" data (ignores "ran") to check query bias.
# 1.7 - Added Benchmarking of ELM datasets without queries.
# 1.8 - Partially added Benchmarking dataset generation from PPI data and 3DID.
# 1.9 - Added memsaver option. Replaced SLiMSearch with SLiMProb. Altered default IO paths.
# 1.9 - Removed 3DID again: new ELM interaction_domains file has position-specific PPI details.
# 2.0 - Major overhaul of input options to standardise/clarify. Implemented auto-downloads and PPI datasets.
# 2.1 - Fixed memsaver=T unless in development mode (dev=T). Removed old Assessment. Tested with simbench analysis.
# 2.2 - Replaced searchini=LIST with searchini=FILE and moved to SimBench commands.
# 2.2 - Modified the FN/TN and ResNum calculations. No longer rate TP in random data as OT.
# 2.3 - Changed the default to queries=F. SearchINI bug fix. Added occbench generation.
# 2.4 - Improved error messages.
# 2.5 - Basic OccBench assessment benchmarking. Added ELM Uniprot acclist output. (Download issues?)
# 2.6 - Added ELM domain interactions table: http://www.elm.eu.org/infos/browse_elm_interactiondomains.tsv.
# 2.6 - Fixed issues introduced with new SLiMCore V2.0 SLiMSuite code.
# 2.7 - Reinstate filtering. (Not sure why disabled.) Add genspec=LIST to filter by species. Added domlink=T/F.
# 2.8.0 - Implemented PPIBench benchmarking for datasets without Motifs in name.
# 2.8.1 - Removed use of Protein name for ELM Uniprot entries due to problems mapping old IDs.
# 2.9.0 - Added SLiMMaker ELM reduction table and output.
# 2.9.1 - Enabled download only with generate=F benchmark=F.
# 2.10.0 - Add generation of table mapping PPIBench dataset generation.
# 2.10.1 - Updated ELM Source URLs.
# 2.10.2 - Updated HINT Source URLs.
# 2.11.0 - Fixed issue with ELM motifs file names (*.motifs, not *.motifs.txt). Updated some warning/error messages.
# 2.11.1 - Switched mergesplits=F for SLiMProb run. (Not expecting it.)
# 2.11.2 - Trying to complete implementation of PPIBench.
# 2.11.3 - Fixed ppdb bug when making simbench without ppibench.
# 2.12.0 - Added randat=T/F : Whether to use DAT file for random source [False]
# 2.13.0 - Added balanced=T/F : Whether to reduce benchmarking to datasets found for all RunIDs [True]
# 2.13.1 - Set tuplekeys=T for benchmark assessment runs.
# 2.13.2 - Fixed tuplekeys=T bug for benchmark assessment runs.
# 2.14.0 - Added wPPV = SN/(SN+FPR) for OccBench.
# 2.14.1 - Fixed up PPIBench results loading.
# 2.14.2 - Fixed ByCloud bug.
# 2.15.0 - Updated assessSearchMemSaver() to handle different data types properly. dombench not yet supported.
# 2.16.0 - Added ppi hub/slim summary and motif filter for assessment datasets post-rating (still count as OT)
# 2.16.1 - Bug-fixing PPI generation from pairwise PPI files.
# 2.16.2 - Fixed benchmarking setup bug.
# 2.16.3 - Fixed bug when Hub-PPI links fail during PPI Benchmarking.
# 2.17.0 - Added output of missing datasets when balanced=T.
# 2.18.0 - Added dev OccBench with improved ratings and more efficient results handling. (dev only)
# 2.18.1 - Added additional OccBench options (bymotif, occsource, occspec)
# 2.18.2 - Fixed problem with source file selection ignoring i=-1.
# 2.18.3 - Added better handling of motifs without TP occurrences for OccBench. Added minocctp=INT.
# 2.18.4 - Fixed ELMBench rating bug.
# 2.18.5 - Fixed Balanced=F bug.
# 2.19.0 - Implemented dataset=LIST: List of headers to split dataset into. If blank, will use datatype defaults. 
SLiMBench REST Output formats
for program documentation and options. A plain text version is accessed with
can be used to retrieve individual parts of the output, matching the tabs in the default
) output. Individual
elements can also be parsed from the full (
) server output,
which is formatted as follows:
... contents for OUTFMT section ...
Available REST Outputs
There is currently no specific help available on REST output for this program.