Function
This module is designed to control and execute parallel processing jobs on the IRIDIS cluster based on the script
written by Ivan Wolton. Initially, it will call other programs but, in time, it is envisaged that other programs will
make use of this module and have parallelisation built-in.
In SeqBySeq mode, the program assumes that seqin=FILE
and basefile=X
are given and irun states the program to be run.
Seqin will then be worked through in turn and each sequence farmed out to the irun program. Outputs given by OutList
are then compiled, as is the Log, into the correct basefile=X
given. In the case of *.csv and *.tdt files, the header
row is copied for the first file and then excluded for all subsequent files. For all other files extensions, the
whole output is copied.
Commandline
STANDARD RUN OPTIONS
irun=X
: Exectute a special iRun analysis on Iridis (gopher/slimfinder/qslimfinder/slimsearch/unifake) []
iini=FILE
: Ini file to pass to the called program [None
]
pypath=PATH
: Path to python modules ['/home/re1u06/Serpentry/'
]
rjepy=T/F
: Whether program is an RJE *.py script (adds log processing) [True
]
subsleep=X
: Sleep time (seconds) between cycles of subbing out jobs to hosts [1
]
subjobs=LIST
: List of subjobs to farm out to IRIDIS cluster []
iolimit=X
: Limit of number of IOErrors before termination [50
]
memfree=X
: Min. proportion of node memory to be free before spawning job [0.0
]
test=T/F
: Whether to produce extra output in "test" mode [False
]
keepfree=X
: Number of processors to keep free on head node [1
]
rsh=T/F
: Whether to use rsh to run jobs on other nodes [True
]
SEQBYSEQ OPTIONS
seqbyseq=T/F
: Activate seqbyseq mode - assumes basefile=X
option used for output [False
]
seqin=FILE
: Input sequence file to farm out [None
]
basefile=X
: Base for output files - compiled from individual run results [None
]
outlist=LIST
: List of extensions of outputs to add to basefile for output (basefile.*) []
pickup=X
: Header to extract from OutList file and used to populate AccNum to skip []
SPECIAL iRUN OPTIONS
runid=X
: Text identifier for iX run [None
]
resfile=FILE
: Main output file for iX run [islimfinder.csv
]
sortrun=T/F
: Whether to sort input files by size and run big -> small to avoid hang at end [True
]
loadbalance=T/F
: Whether to split SortRun jobs equally between large & small to avoid memory issues [True
]
History Module Version History
# 0.0 - Initial Compilation.
# 1.0 - Added additional functions to call other programs
# 1.1 - Added UniFake.
# 1.2 - Added generic seqbyseq option
# 1.3 - Modified for IRIDIS3.
# 1.4 - Added catching of IOErrors.
# 1.5 - Added QSLiMFinder iRun
# 1.6 - Modified iSLiMFinder job processing to try to catch errors better. (Not sure what is happening.)
# 1.7 - Added memory checking before a run is spawned.
# 1.8 - Added load balance option for SortRun: splits jobs equally between large and small input (& ends in middle).
# 1.9 - Added scanning of legacy folder - moving GOPHER_V2!
# 1.10- Modified freemem setting to run on Katana. Made rsh optional. Removed defunct IRIDIS3 option.
# 1.10.1 - Attempted to fix SLiMFarmer batch run problem. (Should not be setting irun=batch!)
# 1.10.2 - Trying to clean up unknown 30s pause. Might be freemem issue?
# 1.10.3 - Fix issues with batch farming of subjobs splitting on commas.