Module:	SLiMBench
Description:	Short Linear Motif prediction Benchmarking
Version:	2.19.0
Last Edit:	31/01/19
Citation:	Palopoli N, Lythgow KT & Edwards RJ. Bioinformatics 2015; doi: 10.1093/bioinformatics/btv155

Imported modules: rje rje_db rje_obj rje_ppi rje_seq rje_seqlist rje_slim rje_slimcore rje_slimlist rje_uniprot comparimotif_V3 slimmaker slimprob slimsearch

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

SLiMBench has two primary functions:

1. Generating SLiM prediction benchmarking datasets from ELM (or other data in a similar format). This includes options for generating random and/or simulated datasets for ROC analysis etc.

2. Assessing the results of SLiM predictions against a Benchmark. This program is designed to work with SLiMFinder and QSLiMFinder output, so some prior results parsing may be needed for other methods.

If generate=F benchmark=F, SLiMBench will check and optionally download the input files but perform no additional processing or analysis.

Please see the SLiMBench manual for more details.

Commandline

SOURCE DATA OPTIONS

sourcepath=PATH/ sourcedate=DATE elmclass=FILE elminstance=FILE elminteractors=FILE elmdomains=FILE elmdat=FILE ppisource=X ppispec=LIST ppid=X randsource=FILE randat=T/F occsource=X/FILE occspec=LIST download=T/F integrity=T/F unipath=PATH

GENERAL/ELM BENCHMARK GENERATION OPTIONS

genpath=PATH generate=T/F genspec=LIST slimmaker=T/F minupc=X maxseq=X minic=X filterdir=X queries=T/F flankmask=LIST elmbench=T/F ppibench=T/F domlink=T/F itype=X dombench=T/F occbench=T/F : Will look in this directory for input files if not found [SourceData/]
: Source file date (YYYY-MM-DD) to preferentially use [None]
: Download from ELM website of ELM classes [elm_classes.tsv]
: Download from ELM website of ELM instances [elm_instances.tsv]
: Download from ELM website of ELM interactors [elm_interactions.tsv]
: Download from ELM website of ELM Pfam domain interactors [elm_interaction_domains.tsv]
: File of downloaded UniProt entries (See rje_uniprot for more details) ['ELM.dat']
: Source of PPI data. (See documentation for details.) (HINT/FILE) ['HINT']
: List of PPI files/species/databases to generate PPI datasets from [HUMAN,MOUSE,DROME,YEAST]
: PPI source protein identifier type (gene/uni/none; will work out from headers if None) [None]
: Source for random/simulated dataset sequences. If species, will extract from UniProt [HUMAN]
: Whether to use DAT file for random source [False]
: Source for OccBench datasets (ELM, RandSource, or file). (ELM/RAND/FILE) [ELM]
: Restrict OccBench analysis to given species; blank for all (useful for occsource=ELM/FILE) []
: Whether to download files directly from websites where possible if missing [True]
: Whether to quit by default if source data integrity is breached [False]
: Path to UniProt download. Will query website if "URL" [URL]
: Output path for datasets generated with SLiMBench file generator [./SLiMBenchDatasets/]
: Whether to generate SLiMBench datasets from ELM input. [False]
: Restrict ELM/OccBench datasets to listed species (restricts ELM instances) []
: Whether to use SLiMMaker to "reduce" ELMs to more findable SLiMs [True]
: Minimum number of UPC for benchmark dataset [3]
: Maximum number of sequences for benchmark datasets [0]
: Min information content for a motif (1 fixed position = 1.0) [2.0; 1.1 for OccBench]
: Directory suffix for filtered benchmarking datasets [_Filtered/]
: Whether to generate datasets with specific Query proteins [False]
: List of flanking mask options (used with queries and simbench) [none,win100,flank5,site]
: Whether to generate ELM datasets [True]
: Whether to generate ELM PPI datasets [True]
: Link ELMs to PPI via Pfam domains (True) or (False) just use direct protein links [True]
: Interaction identifer for PPI datasets [first element of ppisource]
: Whether to generate Pfam domain ELM PPI datasets [True]
: Whether to generate ELM OccBench datasets [True]

RANDOM/SIMULATION BENCHMARK GENERATION OPTIONS

simbench=T/F : Whether to generate simulated datasets using reduced ELMs (if found) [False]
ranbench=T/F : Whether to generate randomised datasets (part of simulation if simbench=T) [False]
randreps=X : Number of replicates for each random (or simulated) datasets [8]
simcount=LIST : Number of "TPs" to have in dataset [4,8,16]
simratios=LIST : List of simulated ELM:Random ratios [0,1,3,7,15,31]
randir=PATH : Output path for creation of randomised datasets [./SLiMBenchDatasets/Random/]
randbase=X : Base for random dataset name if simbench=F [ran]
masking=T/F : Whether to use SLiMCore masking for query selection [True]
searchini=FILE : INI file containing SLiMProb search options that restrict returned positives []
maxseq=X : Maximum number of randsource sequences for SLiM to hit (also maxaa and maxupc limits) [1000]

BENCHMARK ASSESSMENT OPTIONS

benchmark=T/F : Whether to perfrom SLiMBench benchmarking assessment against motif file [False]
datatype=X : Type of data to be generated and/or benchmarked (occ/elm/ppi/dom/sim/simonly) [elm]
queries=T/F : Whether to datasets have specific Query proteins [False]
resfiles=LIST : List of (Q)SLiMFinder results files to use for benchmarking [*.csv]
balanced=T/F : Whether to reduce benchmarking to datasets found for all RunIDs [True]
compdb=FILE : Motif file to be used for benchmarking [elmclass file] (reduced unless occ/ppi)
occbenchpos=FILE : File of all positive occurrences for OccBench [genpath/ELM_OccBench/ELM.full.ratings.csv]
bymotif=T/F : Whether to weight output by motif for OccBench (others always weighted) [True]
benchbase=X : Basefile for SLiMBench benchmarking output [slimbench]
runid=LIST : List of factors to split RunID column into (on '.') [Program,Analysis]
dataset=LIST : List of headers to split dataset into (on '.'). If blank, will use datatype defaults. []
bycloud=X : Whether to compress results into clouds prior to assessment (True/False/Both) [Both]
sigcut=LIST : Significance thresholds to use for assessment [0.1,0.05,0.01,0.001,0.0001]
iccut=LIST : Minimum IC for (Q)SLiMFinder results for elm/sim/ppi benchmark assessment [2.0,2.1,3.0]
slimlencut=LIST : List of individual SLiM lengths to return results for (0=All) [0,3,4,5]
noamb=T/F : Filter out ambiguous patterns [False]
assfilter=LIST : List of motifs to filter out from assessment datasets post-rating (still count as OT) []
minocctp=INT : Min number of occurence TP for OccBench motifs [1]

GENERAL OPTIONS

force=T/F : Whether to force regeneration of outputs (True) or assume existing outputs are right [False]
backups=T/F : Whether to (prompt if interactive and) generate backups before overwriting files [True]

See also rje.py generic commandline options.

History Module Version History

    # 0.0 - Initial Compilation.
    # 0.1 - Functional version with benchmarking dataset generation.
    # 1.0 - Consolidation of "working" version with additional basic benchmarking analysis.
    # 1.1 - Added simulated dataset construction and benchmarking.
    # 1.2 - Added MinIC filtering to benchmark assessment. Sorted beginning/end of line for reduced ELMs.
    # 1.3 - Made SimCount a list rather than Integer. Sorted CompariMotif assessment issue.
    # 1.4 - Added ICCut and SLiMLenCut as lists and output columns.
    # 1.5 - Added Summary Results output table. Removed PropRes.
    # 1.6 - Added "simonly" to datatype - calculates both SN and FPR from "sim" data (ignores "ran") to check query bias.
    # 1.7 - Added Benchmarking of ELM datasets without queries.
    # 1.8 - Partially added Benchmarking dataset generation from PPI data and 3DID.
    # 1.9 - Added memsaver option. Replaced SLiMSearch with SLiMProb. Altered default IO paths.
    # 1.9 - Removed 3DID again: new ELM interaction_domains file has position-specific PPI details.
    # 2.0 - Major overhaul of input options to standardise/clarify. Implemented auto-downloads and PPI datasets.
    # 2.1 - Fixed memsaver=T unless in development mode (dev=T). Removed old Assessment. Tested with simbench analysis.
    # 2.2 - Replaced searchini=LIST with searchini=FILE and moved to SimBench commands.
    # 2.2 - Modified the FN/TN and ResNum calculations. No longer rate TP in random data as OT.
    # 2.3 - Changed the default to queries=F. SearchINI bug fix. Added occbench generation.
    # 2.4 - Improved error messages.
    # 2.5 - Basic OccBench assessment benchmarking. Added ELM Uniprot acclist output. (Download issues?)
    # 2.6 - Added ELM domain interactions table: http://www.elm.eu.org/infos/browse_elm_interactiondomains.tsv.
    # 2.6 - Fixed issues introduced with new SLiMCore V2.0 SLiMSuite code.
    # 2.7 - Reinstate filtering. (Not sure why disabled.) Add genspec=LIST to filter by species. Added domlink=T/F.
    # 2.8.0 - Implemented PPIBench benchmarking for datasets without Motifs in name.
    # 2.8.1 - Removed use of Protein name for ELM Uniprot entries due to problems mapping old IDs.
    # 2.9.0 - Added SLiMMaker ELM reduction table and output.
    # 2.9.1 - Enabled download only with generate=F benchmark=F.
    # 2.10.0 - Add generation of table mapping PPIBench dataset generation.
    # 2.10.1 - Updated ELM Source URLs.
    # 2.10.2 - Updated HINT Source URLs.
    # 2.11.0 - Fixed issue with ELM motifs file names (*.motifs, not *.motifs.txt). Updated some warning/error messages.
    # 2.11.1 - Switched mergesplits=F for SLiMProb run. (Not expecting it.)
    # 2.11.2 - Trying to complete implementation of PPIBench.
    # 2.11.3 - Fixed ppdb bug when making simbench without ppibench.
    # 2.12.0 - Added randat=T/F : Whether to use DAT file for random source [False]
    # 2.13.0 - Added balanced=T/F : Whether to reduce benchmarking to datasets found for all RunIDs [True]
    # 2.13.1 - Set tuplekeys=T for benchmark assessment runs.
    # 2.13.2 - Fixed tuplekeys=T bug for benchmark assessment runs.
    # 2.14.0 - Added wPPV = SN/(SN+FPR) for OccBench.
    # 2.14.1 - Fixed up PPIBench results loading.
    # 2.14.2 - Fixed ByCloud bug.
    # 2.15.0 - Updated assessSearchMemSaver() to handle different data types properly. dombench not yet supported.
    # 2.16.0 - Added ppi hub/slim summary and motif filter for assessment datasets post-rating (still count as OT)
    # 2.16.1 - Bug-fixing PPI generation from pairwise PPI files.
    # 2.16.2 - Fixed benchmarking setup bug.
    # 2.16.3 - Fixed bug when Hub-PPI links fail during PPI Benchmarking.
    # 2.17.0 - Added output of missing datasets when balanced=T.
    # 2.18.0 - Added dev OccBench with improved ratings and more efficient results handling. (dev only)
    # 2.18.1 - Added additional OccBench options (bymotif, occsource, occspec)
    # 2.18.2 - Fixed problem with source file selection ignoring i=-1.
    # 2.18.3 - Added better handling of motifs without TP occurrences for OccBench. Added minocctp=INT.
    # 2.18.4 - Fixed ELMBench rating bug.
    # 2.18.5 - Fixed Balanced=F bug.
    # 2.19.0 - Implemented dataset=LIST: List of headers to split dataset into. If blank, will use datatype defaults. []

SLiMBench REST Output formats

Run with &rest=docs for program documentation and options. A plain text version is accessed with &rest=help.
&rest=OUTFMT can be used to retrieve individual parts of the output, matching the tabs in the default
(&rest=format) output. Individual OUTFMT elements can also be parsed from the full (&rest=full) server output,
which is formatted as follows:

###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~###
# OUTFMT:
... contents for OUTFMT section ...

Available REST Outputs

There is currently no specific help available on REST output for this program.

SLiMSuite REST Server

SLiMBench V2.19.0

Short Linear Motif prediction Benchmarking