Function
SLiMBench has two primary functions:
1. Generating SLiM prediction benchmarking datasets from ELM (or other data in a similar format). This includes
options for generating random and/or simulated datasets for ROC analysis etc.
2. Assessing the results of SLiM predictions against a Benchmark. This program is designed to work with SLiMFinder
and QSLiMFinder output, so some prior results parsing may be needed for other methods.
Documentation for SLiMBench is currently under development. Please contact the author for more details.
Commandline
INPUT OPTIONS
sourcepath=PATH
/ : Will look in this directory for input files if not found ['SourceData/'
]
elmclass=FILE
: Download from ELM website of ELM classes ['elm_classes.tsv'
]
elminstance=FILE
: Download from ELM website of ELM instances ['elm_instances.tsv'
]
elmpfam=FILE
: Download from ELM website of ELM Pfam domain interactors ['elm_interaction_domains.tsv'
]
uniprot=FILE
: File of downloaded UniProt entries (See rje_uniprot for more details) ['ELM.dat'
]
ELM BENCHMARK GENERATION OPTIONS
genpath=PATH
: Output path for datasets generated with SLiMBench file generator [./SLiMBenchDatasets/
]
integrity=T/F
: Whether to quit by default if input integrity is breached [True
]
generate=T/F
: Whether to generate SLiMBench datasets from ELM input [False
]
slimmaker=T/F
: Whether to use SLiMMaker to "reduce" ELMs to more findable SLiMs [True
]
minupc=X
: Minimum number of UPC for ELM dataset [True
]
minic=X
: Min information content for a motif (1 fixed position = 1.0) [2.0
]
queries=T/F
: Whether to generate datasets with specific Query proteins [True
]
flankmask=LIST
: List of flanking mask options [none,win300,win100,flank5,site
]
searchini=LIST
: List of INI files containing search options (should have runid setting) []
ELM PPI/3DID BENCHMARK GENERATION OPTIONS
elmpfam=FILE
: Download from ELM website of ELM Pfam domain interactors ['elm_interaction_domains.tsv'
]
pfamdata=FILE
: File mapping PFam domains onto genes/proteins (BioMart or HMM search) []
xrefdata=FILE
: File of gene identifier cross-reference data from rje_genemap []
3didsql=PATH
: Path to 3DID sql data. Use rje_mysql sqldump to extract 3DID DMI data. []
dmidata=FILE
: File of 3DID DMI data ['3did.DMI.csv'
]
pdbdata=FILE
: File mapping PDB identifiers onto genes/proteins []
RANDOM/SIMULATION BENCHMARK GENERATION OPTIONS
simulate=T/F
: Whether to generate simulated datasets using reduced ELMs (if found) [False
]
randomise=T/F
: Whether to generate randomised datasets (part of simulation if simulate=T
) [False
]
randreps=X
: Number of replicates for each random (or simulated) datasets [10
]
simratios=LIST
: List of simulated ELM:Random rations [1,4,9,19
]
simcount=LIST
: Number of "TPs" to have in dataset [5,10
]
randir=PATH
: Output path for creation of randomised datasets [./SLiMBenchDatasets/Random/
]
randbase=X
: Base for random dataset name if simulate=F
[ran
]
randsource=FILE
: Source for new sequences for random datasets [None
]
masking=T/F
: Whether to use SLiMCore masking for query selection [True
]
BENCHMARK ASSESSMENT OPTIONS
benchmark=T/F
: Whether to perfrom SLiMBench benchmarking assessment against motif file [False
]
datatype=X
: Type of data to be generated and/or benchmarked (elm/sim/simonly) [elm
]
resfiles=LIST
: List of (Q)SLiMFinder results files to use for benchmarking [*.csv
]
compdb=FILE
: Motif file to be used for benchmarking (default = reduced elmclass file) []
benchbase=X
: Basefile for SLiMBench benchmarking output [slimbench
]
runid=LIST
: List of factors to split RunID column into (on '.') ['Program','Analysis'
]
bycloud=X
: Whether to compress results into clouds prior to assessment (True/False/Both) [Both
]
sigcut=LIST
: Significance thresholds to use for assessment [0.05,0.01,0.001,0.0001
]
iccut=LIST
: Minimum IC for (Q)SLiMFinder results for benchmark assessment [2.0,2.1,3.0
]
slimlencut=LIST
: List of individual SLiM lengths to return results for (0=All
) [0,3,4,5
]
noamb=T/F
: Filter out ambiguous patterns [False
]
# Add CompariMotif settings here for OT/TP etc.
GENERAL OPTIONS
force=T/F
: Whether to force regeneration of outputs (True) or assume existing outputs are right [False
]
backups=T/F
: Whether to (prompt if interactive and) generate backups before overwriting files [True
]
See also rje.py generic commandline options.
History Module Version History
# 0.0 - Initial Compilation.
# 0.1 - Functional version with benchmarking dataset generation.
# 1.0 - Consolidation of "working" version with additional basic benchmarking analysis.
# 1.1 - Added simulated dataset construction and benchmarking.
# 1.2 - Added MinIC filtering to benchmark assessment. Sorted beginning/end of line for reduced ELMs.
# 1.3 - Made SimCount a list rather than Integer. Sorted CompariMotif assessment issue.
# 1.4 - Added ICCut and SLiMLenCut as lists and output columns.
# 1.5 - Added Summary Results output table. Removed PropRes.
# 1.6 - Added "simonly" to datatype - calculates both SN and FPR from "sim" data (ignores "ran") to check query bias.
# 1.7 - Added Benchmarking of ELM datasets without queries.
# 1.8 - Partially added Benchmarking dataset generation from PPI data and 3DID.
# 1.9 - Added memsaver option. Replaced SLiMSearch with SLiMProb. Altered default IO paths.
SLiMBench REST Output formats
Run with
&rest=docs
for program documentation and options. A plain text version is accessed with
&rest=help
.
&rest=OUTFMT
can be used to retrieve individual parts of the output, matching the tabs in the default
(
&rest=format
) output. Individual
OUTFMT
elements can also be parsed from the full (
&rest=full
) server output,
which is formatted as follows:
###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~###
# OUTFMT:
... contents for OUTFMT section ...
Available REST Outputs
There is currently no specific help available on REST output for this program.