SLiMBench has two primary functions:
1. Generating SLiM prediction benchmarking datasets from ELM (or other data in a similar format). This includes
options for generating random and/or simulated datasets for ROC analysis etc.
2. Assessing the results of SLiM predictions against a Benchmark. This program is designed to work with SLiMFinder
and QSLiMFinder output, so some prior results parsing may be needed for other methods.
, SLiMBench will check and optionally download the input files but perform no additional
processing or analysis.
Please see the SLiMBench manual for more details.
SOURCE DATA OPTIONS
sourcepath=PATH/ : Will look in this directory for input files if not found [
sourcedate=DATE : Source file date (YYYY-MM-DD) to preferentially use [
elmclass=FILE : Download from ELM website of ELM classes [
elminstance=FILE : Download from ELM website of ELM instances [
elminteractors=FILE : Download from ELM website of ELM interactors [
elmdomains=FILE : Download from ELM website of ELM Pfam domain interactors [
elmdat=FILE : File of downloaded UniProt entries (See rje_uniprot for more details) [
ppisource=X : Source of PPI data. (See documentation for details.) (HINT/FILE) [
ppispec=LIST : List of PPI files/species/databases to generate PPI datasets from [
ppid=X : PPI source protein identifier type (gene/uni/none; will work out from headers if None) [
randsource=FILE : Source for random/simulated dataset sequences. If species, will extract from UniProt [
download=T/F : Whether to download files directly from websites where possible if missing [
integrity=T/F : Whether to quit by default if source data integrity is breached [
unipath=PATH : Path to UniProt download. Will query website if "URL" [
GENERAL/ELM BENCHMARK GENERATION OPTIONS
genpath=PATH : Output path for datasets generated with SLiMBench file generator [
generate=T/F : Whether to generate SLiMBench datasets from ELM input. [
genspec=LIST : Restrict ELM/OccBench datasets to listed species (restricts ELM instances) 
slimmaker=T/F : Whether to use SLiMMaker to "reduce" ELMs to more findable SLiMs [
minupc=X : Minimum number of UPC for benchmark dataset [
maxseq=X : Maximum number of sequences for benchmark datasets [
minic=X : Min information content for a motif (1 fixed position = 1.0) [
2.0; 1.1 for OccBench]
filterdir=X : Directory suffix for filtered benchmarking datasets [
queries=T/F : Whether to generate datasets with specific Query proteins [
flankmask=LIST : List of flanking mask options (used with queries and simbench) [
elmbench=T/F : Whether to generate ELM datasets [
ppibench=T/F : Whether to generate ELM PPI datasets [
domlink=T/F : Link ELMs to PPI via Pfam domains (True) or (False) just use direct protein links [
itype=X : Interaction identifer for PPI datasets [
first element of ppisource]
dombench=T/F : Whether to generate Pfam domain ELM PPI datasets [
occbench=T/F : Whether to generate ELM OccBench datasets [
RANDOM/SIMULATION BENCHMARK GENERATION OPTIONS
simbench=T/F : Whether to generate simulated datasets using reduced ELMs (if found) [
ranbench=T/F : Whether to generate randomised datasets (part of simulation if
randreps=X : Number of replicates for each random (or simulated) datasets [
simcount=LIST : Number of "TPs" to have in dataset [
simratios=LIST : List of simulated ELM:Random ratios [
randir=PATH : Output path for creation of randomised datasets [
randbase=X : Base for random dataset name if
masking=T/F : Whether to use SLiMCore masking for query selection [
searchini=FILE : INI file containing SLiMProb search options that restrict returned positives 
maxseq=X : Maximum number of randsource sequences for SLiM to hit (also maxaa and maxupc limits) [
BENCHMARK ASSESSMENT OPTIONS
benchmark=T/F : Whether to perfrom SLiMBench benchmarking assessment against motif file [
datatype=X : Type of data to be generated and/or benchmarked (occ/elm/ppi/sim/simonly) [
queries=T/F : Whether to datasets have specific Query proteins [
resfiles=LIST : List of (Q)SLiMFinder results files to use for benchmarking [
compdb=FILE : Motif file to be used for benchmarking [elmclass file] (reduced unless occ/ppi)
occbenchpos=FILE : File of all positive occurrences for OccBench [
benchbase=X : Basefile for SLiMBench benchmarking output [
runid=LIST : List of factors to split RunID column into (on '.') [
bycloud=X : Whether to compress results into clouds prior to assessment (True/False/Both) [
sigcut=LIST : Significance thresholds to use for assessment [
iccut=LIST : Minimum IC for (Q)SLiMFinder results for elm/sim/ppi benchmark assessment [
slimlencut=LIST : List of individual SLiM lengths to return results for (
noamb=T/F : Filter out ambiguous patterns [
force=T/F : Whether to force regeneration of outputs (True) or assume existing outputs are right [
backups=T/F : Whether to (prompt if interactive and) generate backups before overwriting files [
See also rje.py generic commandline options.
History Module Version History
# 0.0 - Initial Compilation.
# 0.1 - Functional version with benchmarking dataset generation.
# 1.0 - Consolidation of "working" version with additional basic benchmarking analysis.
# 1.1 - Added simulated dataset construction and benchmarking.
# 1.2 - Added MinIC filtering to benchmark assessment. Sorted beginning/end of line for reduced ELMs.
# 1.3 - Made SimCount a list rather than Integer. Sorted CompariMotif assessment issue.
# 1.4 - Added ICCut and SLiMLenCut as lists and output columns.
# 1.5 - Added Summary Results output table. Removed PropRes.
# 1.6 - Added "simonly" to datatype - calculates both SN and FPR from "sim" data (ignores "ran") to check query bias.
# 1.7 - Added Benchmarking of ELM datasets without queries.
# 1.8 - Partially added Benchmarking dataset generation from PPI data and 3DID.
# 1.9 - Added memsaver option. Replaced SLiMSearch with SLiMProb. Altered default IO paths.
# 1.9 - Removed 3DID again: new ELM interaction_domains file has position-specific PPI details.
# 2.0 - Major overhaul of input options to standardise/clarify. Implemented auto-downloads and PPI datasets.
# 2.1 - Fixed memsaver=T unless in development mode (dev=T). Removed old Assessment. Tested with simbench analysis.
# 2.2 - Replaced searchini=LIST with searchini=FILE and moved to SimBench commands.
# 2.2 - Modified the FN/TN and ResNum calculations. No longer rate TP in random data as OT.
# 2.3 - Changed the default to queries=F. SearchINI bug fix. Added occbench generation.
# 2.4 - Improved error messages.
# 2.5 - Basic OccBench assessment benchmarking. Added ELM Uniprot acclist output. (Download issues?)
# 2.6 - Added ELM domain interactions table: http://www.elm.eu.org/infos/browse_elm_interactiondomains.tsv.
# 2.6 - Fixed issues introduced with new SLiMCore V2.0 SLiMSuite code.
# 2.7 - Reinstate filtering. (Not sure why disabled.) Add genspec=LIST to filter by species. Added domlink=T/F.
# 2.8.0 - Implemented PPIBench benchmarking for datasets without Motifs in name.
# 2.8.1 - Removed use of Protein name for ELM Uniprot entries due to problems mapping old IDs.
# 2.9.0 - Added SLiMMaker ELM reduction table and output.
# 2.9.1 - Enabled download only with generate=F benchmark=F.
# 2.10.0 - Add generation of table mapping PPIBench dataset generation.
# 2.10.1 - Updated ELM Source URLs.