|Description:|| SLiMSuite HPC job farming control program|
|Last Edit:|| 05/10/16|
Copyright © 2014 Richard J. Edwards - See source code for GNU License Notice
See SLiMSuite Blog for further documentation.
This module is designed to control and execute parallel processing jobs on an HPC cluster using PBS and QSUB. If
qsub=T it will generate a job file and use qsub to place that job in the queue using the appropriate parameter
farm=X gives a recognised program (below) or
hpcmode is not
fork then the qsub
job will call SLiMFarmer with the same commandline options, plus
seqbyseq=T, this will be
run in a special way. (See SeqBySeq mode.) Otherwise
slimsuite=T indicates that
farm=X is a SLiMSuite program,
for which the python call and
pypath will be added. If this program uses forking then it should parallelise over a
single multi-processor node. If
farm=X contains a
/ path separator, this will be added to
pypath, otherwise it
will be assumed that
farm is in
slimsuite=F then farm should be a program call to be queued in the
PBS job file instead.
Currently recognised SLiMSuite programs for farming: SLiMFinder, QSLiMFinder, SLiMProb, SLiMCore.
Currently recognised SLiMSuite programs for rsh mode only qsub farming: GOPHER, SLiMSearch, UniFake.
NOTE: Any commandline options that need bracketing quotes will need to be placed into an ini file. This can either
be the ini file used by SLiMFarmer, or a
jobini=FILE that will only be used by the farmed programs. Note that
slimfarmer.ini will not be passed on to other SLiMSuite programs unless
ini=slimfarmer.ini is given
as a commandline argument.
runid=X setting is important for SLiMSuite job farming as this is what separates different parameter setting
combinations run on the same data and is also used for identifying which datasets have already been run. Running
several jobs on the same data using the same SLiMSuite program but with different parameter settings will therefore
cause problems. If runid is not set, it will default to the
hpcmode=X setting determines the method used for farming out jobs across the nodes.
hpcmode=rsh uses rsh to spawn
the additional processes out to other nodes, based on a script written for the IRIDIS HPC by Ivan Wolton.
hpcmode=fork will restrict analysis to a single node and use Python forking to distribute jobs. This can be used even
on a single multi-processor machine to fork out SLiMSuite jobs.
basefile=X will set the log, RunID, ResFile, ResDir
and Job: RunID and Job will have path stripped; ResFile will have .csv appended.
Initially, it will call other programs but, in time, it is envisaged that other programs will make use of SLiMFarmer
and have parallelisation built-in.
In SeqBySeq mode, the program assumes that
basefile=X are given and
farm=X states the Python program
to be run, which should be SLiMSuite program. (The SLiMSuite subdirectory will also need to be given unless
slimsuite=F, in which case the whole path to the program should be given.
pypath=PATH can set an alternative path.)
Seqin will then be worked through in turn and each sequence farmed out to the farm program. Outputs given by OutList
are then compiled, as is the Log, into the correct
basefile=X given. In the case of *.csv and *.tdt files, the header
row is copied for the first file and then excluded for all subsequent files. For all other files extensions, the
whole output is copied.
Basic QSub Options
qsub=T/F : Whether to execute QSub PDB job creation and queuing [
jobini=FILE : Ini file to pass to the farmed HPC jobs with SLiMFarmer options. Overrides commandline. [
slimsuite=T/F : Whether program is an RJE *.py script (adds log processing) [
nodes=X : Number of nodes to run on [
ppn=X : Processors per node [
walltime=X : Walltime for qsub job (hours) [
vmem=X : Virtual Memory limit for run (GB) [
job=X : Name of job file (.job added) [
Advanced QSub Options
hpc=X : Name of HPC system [
pypath=PATH : Path to python modules [
slimsuite home directoy]
qpath=PATH : Path to change directory too [
pause=X : Wait X seconds before attempting showstart [
email=X : Email address to email job stats to at end [
depend=LIST : List of job ids to wait for before starting job (
dependhpc=X added) 
dependhpc=X : Name of HPC system for depend [
report=T/F : Pull out running job IDs and run showstart [
modules=LIST : List of modules to add in job file e.g. blast+/2.2.31,clustalw 
Main SLiMFarmer Options
farm=X : Execute a special SLiMFarm analysis on HPC [
- batch will farm out a batch list of commands read in from
- gopher/slimfinder/qslimfinder/slimprob/slimcore/slimsearch/unifake = special SLiMSuite HPC.
farm=X will specify the program to be run (see docs)
farm=X will be executed as a system call in place of SLiMFarmer
hpcmode=X : Mode to be used for farming jobs between nodes (rsh/fork) [
forks=X : Number of forks to be used when
jobini=FILE : Ini file to pass to the farmed SLiMSuite run. (Also used for SLiMFarmer options if
Standard HPC Options
subsleep=X : Sleep time (seconds) between cycles of subbing out jobs to hosts [
subjobs=LIST : List of subjobs to farm out to HPC cluster 
iolimit=X : Limit of number of IOErrors before termination [
memfree=X : Min. proportion of node memory to be free before spawning job [
test=T/F : Whether to produce extra output in "test" mode [
keepfree=X : Number of processors to keep free on head node [
seqbyseq=T/F : Activate seqbyseq mode - assumes
basefile=X option used for output [
seqin=FILE : Input sequence file to farm out [
basefile=X : Base for output files - compiled from individual run results [
outlist=LIST : List of extensions of outputs to add to basefile for output (basefile.*) 
pickhead=X : Header to extract from OutList file and used to populate AccNum to skip 
SLiMSuite Farming Options
runid=X : Text identifier for SLiMSuite job farming [
resfile=FILE : Main output file for SLiMSuite run [
pickup=T/F : Whether to pickup previous run based on existing results and RunID [
sortrun=T/F : Whether to sort input files by size and run big -> small to avoid hang at end [
loadbalance=T/F : Whether to split SortRun jobs equally between large & small to avoid memory issues [
basefile=X : Set the log, RunID, ResFile, ResDir and Job to X [None].
See also rje.py generic commandline options.
History Module Version History
# 0.0 - Initial Compilation.
# 1.0 - Functional version using rje_qsub and rje_iridis to fork out SLiMSuite runs.
# 1.1 - Updated to use rje_hpc.JobFarmer and incorporate main SLiMSuite farming within SLiMFarmer class.
# 1.2 - Implemented the slimsuite=T/F option and got SLiMFarmer qsub to work with GOPHER forking.
# 1.3 - Modified default vmem request to 127GB from 64GB.
# 1.4 - Added modules=LIST : List of modules to add in job file [clustalo,mafft]
# 1.4.1 - Fixed farm=batch mode for qsub=T.
# 1.4.2 - Fixed log transfer issues due to new #VIO line. Better handling of crashed runs.
# 1.4.3 - Added recognition of missing slimsuite programs and switching to slimsuite=F.
# 1.4.4 - Modified default vmem request to 126GB from 127GB.
# 1.4.5 - Updated BLAST loading default to 2.2.31
SLiMFarmer REST Output formats
&rest=help for general options. Run with
&rest=full to get full server output as text or
for more user-friendly formatted output. Individual outputs can be identified/parsed using