This module is for taking one or two sequence datasets, peforming an intensive All by All BLAST and then tabulating
the results as a series of pairwise comparisons (
Qry = Query Short Name (or AccNum)
Hit = Hit Short Name
Rank = Rank of that Hit vs Query (based on Score)
Score = BLAST Score (one-line)
E-Value = BLAST E-value
QryLen = Length of Query Sequence
HitLen = Length of Hit Sequence
Qry_AlnLen = Total length of local BLAST alignment fragments in Query (Unordered)
Qry_AlnID = Number of Identical residues of Query aligned against Hit in local BLAST alignments (Unordered)
Qry_AlnSim = Number of Similar residues of Query aligned against Hit in local BLAST alignments (Unordered)
Qry_OrderedAlnLen = Total length of local BLAST alignment fragments in Query (Ordered)
Qry_OrderedAlnID = Number of Identical residues of Query aligned against Hit in local BLAST alignments (Ordered)
Qry_OrderedAlnSim = Number of Similar residues of Query aligned against Hit in local BLAST alignments (Ordered)
Hit_AlnLen = Total length of local BLAST alignment fragments in Hit (Unordered)
Hit_AlnID = Number of Identical residues of Hit aligned against Query in local BLAST alignments (Unordered)
Hit_AlnSim = Number of Similar residues of Hit aligned against Query in local BLAST alignments (Unordered)
Hit_OrderedAlnLen = Total length of local BLAST alignment fragments in Hit (Ordered)
Hit_OrderedAlnID = Number of Identical residues of Hit aligned against Query in local BLAST alignments (Ordered)
Hit_OrderedAlnSim = Number of Similar residues of Hit aligned against Query in local BLAST alignments (Unordered)
ALIGN_ID = Number of Identical residues as determined by pairwise ALIGN
ALIGN_Len = Length of pairwise ALIGN
searchdb=FILE is DNA (
blastprog=blastn/tblastn/tblastx), there will be additional
Dirn fields that indicate
whether the hits are on the forward strand (
Fwd), reverse strand (
Bwd) or a combination (
By default, all BLAST hits will return alignments. (
N is the size of
searchdb.) This can
be over-ridden by the
blastb=X options to limit results to the top
GABLAM will also produce a single table of summary statistics for all non-self hits (
*.hitsum.*) (self hits included
Qry = Query Short Name (or AccNum)
Hits = Number of Hits
MaxScore = Max non-self BLAST Score (one-line)
E-Value = BLAST E-value for max score
keepblast=T then the blast results files themselves will not be deleted and will also be available as output -
or for re-running GABLAM faster on the same input. This can be performed even when the output basefile has changed
by giving the BLAST results file with
V2.8: All-by-all search output
Version 2.8 onwards features explicit extra functionality for all-by-all searches, where the QueryDB (
searchdb=FILE) are the same. (Failing to give a searchdb will run in this mode.)
V2.16: FullBLAST mode
Version 2.16.x introduces a new "fullblast" mode, which performs a full BLAST search (using
forks=X to set the number
of processors for the BLAST search) followed by the
blastres=FILE multiGABLAM processing. This should be faster for
large datasets but precludes any appending of results files. This is incompatible with the
update option. (
missing=LIST should only be required for aborted
fullblast=F runs.) By default, the blast file will
be named the same as the other output but the
blastres=FILE command can be used to read in results from a file with
a different name. If this file was not created with sequences in QueryDB and SearchDB, it will create errors.
V2.20: SNP Table & LocalUnique output
Version 2.20.x added a
snptable=T/F output to generate a SNP table (similar to MUMmer NUCmer output) with the
localunique=T/F controls whether hit
regions can be covered multiple times (
False) or (default:
True) reduced to unique "best" hits. Local hits are
sorted according to
Identity). NOTE: Output is restricted to regions of overlap between
query and hit sequences. Regions of each that are not covered will be output in
*.unique.tdt output for the regions covered. Normally, the reference genome will be
seqin=FILE and the genome
to compare against the reference will be
localidmin are applied to unique
regions as well as the original local hits table.
V2.26: Fragment fasta output
Version 2.26.x tidied up the
combinedfas=T output, which are now only available in combination when
gablamfrag=X parameter controls fragmentation within a given Query-Hit pair, i.e. local hits
gablamfrag=X positions of each other in the Hit will be merged.
fragmerge=X applies to different hit regions
when they are combined during
combinedfas=T output, and will merge regions with
irrespective of whether they are hits from the same query or different ones. Fragments are strand-specific and hits
to the reverse strand will be reverse-complemented by default. If
fragrevcomp=F then all local hits will be mapped
onto the forward strand, regardless of their original direction.
Default values are now
. This means that hits from the same query will merge if directly
adjacent and on the same strand, whereas hits from different queries will only merge if overlapping and on the same
strand. To restrict merging to overlaps of a specific minimum length, set the relevant parameter to a negative
fragmerge=-10 will require a 10 bp/aa overlap before merging.
gablamfrag can only be set to values
below 1 when used in combination with
The defaults for when GABLAM is run from the commandline have been set to
keepblast=T. When GABLAM is
called from within another program, the
keepblast=T/F default status is set by that program.
seqin=FILE : Query dataset file [
searchdb=FILE : Database to search. [
By default, same as seqin]
blastres=FILE : BLAST results file for input (over-rides seqin and searchdb) [
fullblast=T/F : Whether to perform full BLAST followed by blastres analysis [
blastp=X : Type of BLAST search to perform (blastx for DNA vs prot; tblastn for Prot vs DNA) [
gablamcut=X : Min. percentage value for a GABLAM stat to report hit (GABLAM from FASTA only) [
cutstat=X : Stat for gablamcut (eg. AlnLen or OrderedAlnSim. See Function docs for full list) [
cutfocus=X : Focus for gablamcut. Can be Query/Hit/Either/Both. [
localcut=X : Cut-off length for local alignments contributing to global GABLAM stats [
localidcut=PERC : Cut-off local %identity for local alignments contributing to global GABLAM stats [
General Output Options
append=T/F : Whether to append to output file or not. (Not available for
fullres=T/F : Whether to output full GABLAM results table [
hitsum=T/F : Whether to output the BLAST Hit Summary table [
local=T/F : Whether to output local alignment summary stats table [
localsAM=T/F : Save local (and unique) hits data as SAM files in addition to TDT [
reftype=X : Whether to map SAM/GFF3 hits onto the Qry, Hit, Both or Combined [
qassemble=T/F : Whether to fully assemble query stats from all hits in HitSum [
localmin=X : Minimum length of local alignment to output to local stats table [
localidmin=PERC : Minimum local %identity of local alignment to output to local stats table [
localunique=T/F : Reduce local hits to unique non-overlapping regions (*.unique.tdt) [
localsort=X : Local hit field used to sort local alignments for localunique reduction [
snptable=T/F : Generate a SNP table (similar to MUMmer NUCmer output) for query/hit overlap (
selfhit=T/F : Whether to include self hits in the fullres output [True] * See also
selfsum=T/F : Whether to also include self hits in hitsum output [False] * selfhit must also be T *
qryacc=T/F : Whether to use the Accession Number rather than the short name for the Query [
keepblast=T/F : Whether to keep the blast results files rather than delete them [
blastdir=PATH : Path for blast results file (best used with
percres=T/F : Whether output is a percentage figures (2d.p.) or absolute numbers [
- Note that enough data is output to convert one into the other in other packages (for short sequences)
reduced=LIST : List of terms that must be included in reduced output headers (e.g. Hit or Qry_Ordered) 
All-by-all Output Options
maxall=X : Maximum number of sequences for all-by-all outputs [
dismat=T/F : Whether to output compiled distance matrix [
diskey=X : GABLAM Output Key to be used for distance matrix [
distrees=T/F : Whether to generate UPGMA tree summaries of all-by-all distances [
treeformats=LIST: List of output formats for generated trees (see rje_tree.py) [
disgraph=T/F : Whether to output a graph representation of the distance matrix (edges = homology) [
graphtypes=LIST : Formats for graph outputs (svg/xgmml/png/html) [
clusters=T/F : Whether to output a list of clusters based on shared BLAST homology [
bycluster=X : Generate separate trees and distance matrix for clusters of X+ sequences [
clustersplit=X : Threshold at which clusters will be split (e.g. must be < distance to cluster) [
singletons=T/F : Whether to include singleton in main tree and distance matrix [
saveupc=T/F : Whether to output a UPC file for SLiMSuite compatibility [
Sequence output options
localalnfas=T/F : Whether to output local alignments to *.local.fas fasta file (if
fasout=T/F : Output a fasta file per input sequence "ACCNUM.DBASE.fas" [False] (GABLAM from FASTA only)
fasdir=PATH : Directory in which to save fasta files [
fragfas=T/F : Whether to output fragmented Hits based on local alignments [
fragrevcomp=T/F : Whether to reverse-complement DNA fragments that are on reverse strand to query [
gablamfrag=X : Length of gaps between mapped residues for fragmenting local hits [
fragmerge=X : Max Length of gaps between fragmented local hits to merge [
addflanks=X : Add flanking regions of length X to fragmented hits [
combinedfas=T/F : Whether to generate a combined fasta file [
Advanced/Obselete Search/Output Options
dotplots=T/F : Whether to use gablam.r to output dotplots. (Needs R installed and setup) [
dotlocalmin=X : Minimum length of local alignment to output to local hit dot plots [
mysql=T/F : Whether to output column headers for mysql table build [
missing=LIST : This will go through and add missing results for AccNums in FILE (or list of AccNums X,Y,..) [
startfrom=X : Accession number to start from [
alnstats=T/F : Whether to output GABLAM stats or limit to one-line stats (
posinfo=T/F : Output the Start/End limits of the BLAST Hits [
outstats=X : Whether to output just GABLAM, GABLAMO or All [
GABLAM Non-redundancy options. NOTE: These are different to rje_seq NR options.
nrseq=T/F : Make sequences Non-Redundant following all-by-all. [
nrcut=X : Cut-off for non-redundancy filter, uses
nrstat=X for either query or hit [
nrstat=X : Stat for nrcut (eg. AlnLen or OrderedAlnSim. See above for full list) [
nrchoice=LIST : Order of decisions for choosing NR sequence to keep. Otherwise keeps first sequence. (swiss/nonx/length/spec/name/acc/manual) [
nrsamespec=T/F : Non-Redundancy within same species only. [
nrspec=LIST : List of species codes in order of preference (good to bad) 
blastpath+=PATH : path for blast+ files [c:/bioware/blast+/] *Use fwd slashes
blastpath=PATH : path for blast files [c:/bioware/blast/] *Use fwd slashes
blaste=X : E-Value cut-off for BLAST searches (BLAST -e X) [
blastv=X : Number of one-line hits per query (BLAST -v X) [
blastb=X : Number of hit alignments per query (BLAST -b X) [
blastf=T/F : Complexity Filter (BLAST -F X) [
checktype=T/F : Whether to check sequence types and BLAST program selection [
Additional ALIGN Global Identity
globid=T/F : Whether to output Global %ID using ALIGN [
rankaln=X : Perform ALIGN pairwise global alignment for top X hits [
evalaln=X : Perform ALIGN pairwise global alignment for all hits with e <= X [
alncut=X : Perform ALIGN pairwise global alignment until < X %ID reached [
noforks=T/F : Whether to avoid forks [
forks=X : Number of parallel sequences to process at once [
killforks=X : Number of seconds of no activity before killing all remaining forks. [
History Module Version History
# 0.0 - Initial Compilation.
# 1.0 - First working version based on BAM 1.4
# 1.1 - Added blastres=FILE option
# 1.2 - Added percres=T/F option
# 1.3 - Added option to use a GABLAM stat as a cut-off
# 1.4 - Added more output options for webserver
# 2.0 - Major tidy up of module.
# 2.1 - Added DNA GABLAM
# 2.2 - Added reduced=LIST : List of terms that must be included in reduced output headers (e.g. Hit or Qry_Ordered)
# 2.3 - Added local alignment stat output.
# 2.4 - Added distance matrix output and visualisation.
# 2.5 - Miscellaneous cleanup and bug fixes. Updated output file names to use basefile.
# 2.6 - Added full name output for NSF tree.
# 2.7 - Added cluster output.
# 2.8 - Added graph output and replaced dispng and disnsf with distrees. Replaced dismatrix PNG with just tree.
# 2.9 - Added LocalMin and LocalCut for controlling how local alignments are output and/or contribute to GABLAM.
# 2.10- Added dot plot PNG output using gablam.r. Added sequence checking with rje_seqlist -> correct BLAST type.
# 2.11- Altered to use BLAST+ and rje_blast_V2.
# 2.12- Consolidated use of BLAST V2.
# 2.13- Fixed Protein vs DNA GABLAM. Modified sequence extraction to handle larger sequences. Add blastdir=PATH/.
# 2.14- Added checktype=T/F option to check sequence/BLAST type.
# 2.15.0 - Added seqnr function. Add run() method.
# 2.16.0 - Added fullblast=T/F : Whether to perform full BLAST followed by blastres analysis [False]
# 2.16.1 - Fixed a bug where the fullblast option was failing to return scores and evalues.
# 2.17.0 - Added localalnfas=T/F : Whether to output local alignments to *.local.fas fasta file (if local=T) [False]
# 2.17.1 - Fixed bug where query and hit lengths were not being output for fullblast.
# 2.18.0 - Added blaste filtering to be applied to existing BLAST results.
# 2.19.0 - Added maxall=X limits to all-by-all analyses. Added qassemble=T.
# 2.19.1 - Fixed handling of basefile and results generation for blastres=FILE.
# 2.19.2 - Modified output to be in rank order.
# 2.20.0 - Added SNP Table output.
# 2.21.0 - Added nocoverage Table output of regions missing from pairwise SNP Table.
# 2.21.1 - Added fragrevcomp=T/F : Whether to reverse-complement DNA fragments that are on reverse strand to query [True]
# 2.22.0 - Added description to HitSum table.
# 2.22.1 - Added localaln=T/F to keep local alignment sequences in the BLAST local Table.
# 2.22.2 - Fixed local output error. (Query/Qry issue - need to fix this and make consistent!)
# 2.22.3 - Fixed blastv and blastb error: limit also applies to individual pairwise hits!
# 2.23.0 - Divided GablamFrag and FragMerge.
# 2.23.1 - Added tuplekeys=T to cmd_list as default. (Can still be over-ridden if it breaks things!)
# 2.24.0 - Added localidmin and and localidcut as %identity versions of localmin and localcut. (Use for PAGSAT.)
# 2.25.0 - Added localsAM=T/F : Save local (and unique) hits data as SAM files in addition to TDT [False]
# 2.26.0 - Fixed fragfas output and clarified fullblast=T/F, localmin=X and localcut=X. Set fullblast=T keepblast=T.
# 2.26.1 - Fixed keepblast error.
# 2.26.2 - Fixed gablamcut fragfas filtering bug.
# 2.26.3 - Fixed nrseq=T to use Query OR Hit stat for NR filtering.
# 2.26.4 - Minor bug fix to nrchoice command parsing.
# 2.27.0 - Fragmerge no longer removes flanks and can be negative for enforced overlap!
# 2.28.0 - Added localidmin=PERC to localUnique (and thus Snapper).
# 2.28.1 - Fixed missing combinedfas when using existing blastres.
# 2.28.2 - Minor bug fix for NRSeq manual choice when i=-1.
# 2.28.3 - Fixed NRSeq query sorting bug.
GABLAM REST Output formats
for program documentation and options. A plain text version is accessed with
can be used to retrieve individual parts of the output, matching the tabs in the default
) output. Individual
elements can also be parsed from the full (
) server output,
which is formatted as follows:
... contents for OUTFMT section ...
Available REST Outputs
= Non-redundant sequence output [fas]