SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

GABLAM V2.25.0

Global Analysis of BLAST Local AlignMents

Program: GABLAM
Description: Global Analysis of BLAST Local AlignMents
Version: 2.25.0
Last Edit: 11/12/16
Citation: Davey, Shields & Edwards (2006), Nucleic Acids Res. 34(12):3546-54.

Copyright © 2006 Richard J. Edwards - See source code for GNU License Notice


Imported modules: rje rje_db rje_ppi rje_seq rje_seqlist rje_tree rje_blast_V2 rje_dismatrix_V2


See SLiMSuite Blog for further documentation.

Function

This module is for taking one or two sequence datasets, peforming an intensive All by All BLAST and then tabulating the results as a series of pairwise comparisons (*.gablam.*):

  • Qry = Query Short Name (or AccNum)
  • Hit = Hit Short Name
  • Rank = Rank of that Hit vs Query (based on Score)
  • Score = BLAST Score (one-line)
  • E-Value = BLAST E-value
  • QryLen = Length of Query Sequence
  • HitLen = Length of Hit Sequence
  • Qry_AlnLen = Total length of local BLAST alignment fragments in Query (Unordered)
  • Qry_AlnID = Number of Identical residues of Query aligned against Hit in local BLAST alignments (Unordered)
  • Qry_AlnSim = Number of Similar residues of Query aligned against Hit in local BLAST alignments (Unordered)
  • Qry_OrderedAlnLen = Total length of local BLAST alignment fragments in Query (Ordered)
  • Qry_OrderedAlnID = Number of Identical residues of Query aligned against Hit in local BLAST alignments (Ordered)
  • Qry_OrderedAlnSim = Number of Similar residues of Query aligned against Hit in local BLAST alignments (Ordered)
  • Hit_AlnLen = Total length of local BLAST alignment fragments in Hit (Unordered)
  • Hit_AlnID = Number of Identical residues of Hit aligned against Query in local BLAST alignments (Unordered)
  • Hit_AlnSim = Number of Similar residues of Hit aligned against Query in local BLAST alignments (Unordered)
  • Hit_OrderedAlnLen = Total length of local BLAST alignment fragments in Hit (Ordered)
  • Hit_OrderedAlnID = Number of Identical residues of Hit aligned against Query in local BLAST alignments (Ordered)
  • Hit_OrderedAlnSim = Number of Similar residues of Hit aligned against Query in local BLAST alignments (Unordered)
  • ALIGN_ID = Number of Identical residues as determined by pairwise ALIGN
  • ALIGN_Len = Length of pairwise ALIGN

By default, all BLAST hits will return alignments. (blastv=N blastb=N, where N is the size of searchdb.) This can be over-ridden by the blastv=X and blastb=X options to limit results to the top X hits.

GABLAM will also produce a single table of summary statistics for all non-self hits (*.hitsum.*) (self hits included if selfhit=T selfsum=T):

  • Qry = Query Short Name (or AccNum)
  • Hits = Number of Hits
  • MaxScore = Max non-self BLAST Score (one-line)
  • E-Value = BLAST E-value for max score

Version 2.8 onwards features explicit extra functionality for all-by-all searches, where the QueryDB (seqin=FILE) and SearchDB (searchdb=FILE) are the same. (Failing to give a searchdb will run in this mode.)

Version 2.16.x introduces a new "fullblast" mode, which performs a full BLAST search (using forks=X to set the number of processors for the BLAST search) followed by the blastres=FILE multiGABLAM processing. This should be faster for large datasets but precludes any appending of results files. This is incompatible with the missing=LIST advanced update option. (missing=LIST should only be required for aborted fullblast=F runs.)

Version 2.20.x added a snptable=T/F output to generate a SNP table (similar to MUMmer NUCmer output) with the following fields: Locus, Pos, REF, ALT, AltLocus, AltPos. The localunique=T/F controls whether hit regions can be covered multiple times (False) or (default:True) reduced to unique "best" hits. Local hits are sorted according to localsort=X (default:Identity). NOTE: Output is restricted to regions of overlap between query and hit sequences. Regions of each that are not covered will be output in *.nocoverage.tdt. See *.local.tdt or *.unique.tdt output for the regions covered. Normally, the reference genome will be seqin=FILE and the genome to compare against the reference will be searchdb=FILE.

Commandline

Input/Search Options

seqin=FILE : Query dataset file [infile.fas]
searchdb=FILE : Database to search. [By default, same as seqin]
blastres=FILE : BLAST results file for input (over-rides seqin and searchdb) [None]
fullblast=T/F : Whether to perform full BLAST followed by blastres analysis [False]
blastp=X : Type of BLAST search to perform (blastx for DNA vs prot; tblastn for Prot vs DNA) [blastp]
gablamcut=X : Min. percentage value for a GABLAM stat to report hit [0.0] (GABLAM from FASTA only)
cutstat=X : Stat for gablamcut (eg. AlnLen or OrderedAlnSim. See above for full list) [OrderedAlnID]
cutfocus=X : Focus for gablamcut. Can be Query/Hit/Either/Both. [Either]
localcut=X : Cut-off length for local alignments contributing to global GABLAM stats [0]
localidcut=PERC : Cut-off local %identity for local alignments contributing to global GABLAM stats [0.0]

General Output Options

append=T/F : Whether to append to output file or not. (Not available for blastres=FILE or fullblast=F) [False]
fullres=T/F : Whether to output full GABLAM results table [True]
hitsum=T/F : Whether to output the BLAST Hit Summary table [True]
local=T/F : Whether to output local alignment summary stats table [True]
localsAM=T/F : Save local (and unique) hits data as SAM files in addition to TDT [False]
reftype=X : Whether to map SAM/GFF3 hits onto the Qry, Hit, Both or Combined [Hit]
qassemble=T/F : Whether to fully assemble query stats from all hits in HitSum [False]
localmin=X : Minimum length of local alignment to output to local stats table [0]
localidmin=PERC : Minimum local %identity of local alignment to output to local stats table [0.0]
localunique=T/F : Reduce local hits to unique non-overlapping regions (*.unique.tdt) [snptable=T/F]
localsort=X : Local hit field used to sort local alignments for localunique reduction [Identity]
snptable=T/F : Generate a SNP table (similar to MUMmer NUCmer output) for query/hit overlap (fullblast=T) [False]
selfhit=T/F : Whether to include self hits in the fullres output [True] * See also selfsum=T/F *
selfsum=T/F : Whether to also include self hits in hitsum output [False] * selfhit must also be T *
qryacc=T/F : Whether to use the Accession Number rather than the short name for the Query [True]
keepblast=T/F : Whether to keep the blast results files rather than delete them [False]
blastdir=PATH : Path for blast results file (best used with keepblast=T) [./]
percres=T/F : Whether output is a percentage figures (2d.p.) or absolute numbers [True]
- Note that enough data is output to convert one into the other in other packages (for short sequences)
reduced=LIST : List of terms that must be included in reduced output headers (e.g. Hit or Qry_Ordered) []

All-by-all Output Options

maxall=X : Maximum number of sequences for all-by-all outputs [100]
dismat=T/F : Whether to output compiled distance matrix [True]
diskey=X : GABLAM Output Key to be used for distance matrix ['Qry_AlnID']
distrees=T/F : Whether to generate UPGMA tree summaries of all-by-all distances [True]
treeformats=LIST: List of output formats for generated trees (see rje_tree.py) [nwk,text,png]
disgraph=T/F : Whether to output a graph representation of the distance matrix (edges = homology) [False]
graphtypes=LIST : Formats for graph outputs (svg/xgmml/png/html) [xgmml,png]
clusters=T/F : Whether to output a list of clusters based on shared BLAST homology [True]
bycluster=X : Generate separate trees and distance matrix for clusters of X+ sequences [0]
clustersplit=X : Threshold at which clusters will be split (e.g. must be < distance to cluster) [1.0]
singletons=T/F : Whether to include singleton in main tree and distance matrix [False]
saveupc=T/F : Whether to output a UPC file for SLiMSuite compatibility [False]

Sequence output options

localalnfas=T/F : Whether to output local alignments to *.local.fas fasta file (if local=T) [False]
fasout=T/F : Output a fasta file per input sequence "ACCNUM.DBASE.fas" [False] (GABLAM from FASTA only)
fasdir=PATH : Directory in which to save fasta files [BLASTFAS/]
fragfas=T/F : Whether to output fragmented Hits based on local alignments [False]
fragrevcomp=T/F : Whether to reverse-complement DNA fragments that are on reverse strand to query [True]
gablamfrag=X : Length of gaps between mapped residues for fragmenting local hits [100]
fragmerge=X : Max Length of gaps between fragmented local hits to merge [0]
addflanks=X : Add flanking regions of length X to fragmented hits [0]
combinedfas=T/F : Whether to generate a combined fasta file [False]

Advanced/Obselete Search/Output Options

dotplots=T/F : Whether to use gablam.r to output dotplots. (Needs R installed and setup) [False]
dotlocalmin=X : Minimum length of local alignment to output to local hit dot plots [1000]
mysql=T/F : Whether to output column headers for mysql table build [False]
missing=LIST : This will go through and add missing results for AccNums in FILE (or list of AccNums X,Y,..) [None]
startfrom=X : Accession number to start from [None]
alnstats=T/F : Whether to output GABLAM stats or limit to one-line stats (blastb=0) [True]
posinfo=T/F : Output the Start/End limits of the BLAST Hits [True]
outstats=X : Whether to output just GABLAM, GABLAMO or All [All]

GABLAM Non-redundancy options. NOTE: These are different to rje_seq NR options.

nrseq=T/F : Make sequences Non-Redundant following all-by-all. [False]
nrcut=X : Cut-off for non-redundancy filter, uses nrstat=X for either query or hit [100.0]
nrstat=X : Stat for nrcut (eg. AlnLen or OrderedAlnSim. See above for full list) [OrderedAlnID]
nrchoice=LIST : Order of decisions for choosing NR sequence to keep. Otherwise keeps first sequence. (swiss/nonx/length/spec/name/acc/manual) [swiss,nonx,length]
nrsamespec=T/F : Non-Redundancy within same species only. [False]
nrspec=LIST : List of species codes in order of preference (good to bad) []

BLAST Options

blastpath+=PATH : path for blast+ files [c:/bioware/blast+/] *Use fwd slashes
blastpath=PATH : path for blast files [c:/bioware/blast/] *Use fwd slashes
blaste=X : E-Value cut-off for BLAST searches (BLAST -e X) [1e-4]
blastv=X : Number of one-line hits per query (BLAST -v X) [500]
blastb=X : Number of hit alignments per query (BLAST -b X) [500]
blastf=T/F : Complexity Filter (BLAST -F X) [False]
checktype=T/F : Whether to check sequence types and BLAST program selection [True]

Additional ALIGN Global Identity

globid=T/F : Whether to output Global %ID using ALIGN [False]
rankaln=X : Perform ALIGN pairwise global alignment for top X hits [0]
evalaln=X : Perform ALIGN pairwise global alignment for all hits with e <= X [1000]
alncut=X : Perform ALIGN pairwise global alignment until < X %ID reached [0]

Forking Options

noforks=T/F : Whether to avoid forks [False]
forks=X : Number of parallel sequences to process at once [0]
killforks=X : Number of seconds of no activity before killing all remaining forks. [36000]

History Module Version History

    # 0.0 - Initial Compilation.
    # 1.0 - First working version based on BAM 1.4
    # 1.1 - Added blastres=FILE option
    # 1.2 - Added percres=T/F option
    # 1.3 - Added option to use a GABLAM stat as a cut-off
    # 1.4 - Added more output options for webserver
    # 2.0 - Major tidy up of module.
    # 2.1 - Added DNA GABLAM
    # 2.2 - Added reduced=LIST : List of terms that must be included in reduced output headers (e.g. Hit or Qry_Ordered)
    # 2.3 - Added local alignment stat output.
    # 2.4 - Added distance matrix output and visualisation.
    # 2.5 - Miscellaneous cleanup and bug fixes. Updated output file names to use basefile.
    # 2.6 - Added full name output for NSF tree.
    # 2.7 - Added cluster output.
    # 2.8 - Added graph output and replaced dispng and disnsf with distrees. Replaced dismatrix PNG with just tree.
    # 2.9 - Added LocalMin and LocalCut for controlling how local alignments are output and/or contribute to GABLAM.
    # 2.10- Added dot plot PNG output using gablam.r. Added sequence checking with rje_seqlist -> correct BLAST type.
    # 2.11- Altered to use BLAST+ and rje_blast_V2.
    # 2.12- Consolidated use of BLAST V2.
    # 2.13- Fixed Protein vs DNA GABLAM. Modified sequence extraction to handle larger sequences. Add blastdir=PATH/.
    # 2.14- Added checktype=T/F option to check sequence/BLAST type.
    # 2.15.0 - Added seqnr function. Add run() method.
    # 2.16.0 - Added fullblast=T/F : Whether to perform full BLAST followed by blastres analysis [False]
    # 2.16.1 - Fixed a bug where the fullblast option was failing to return scores and evalues.
    # 2.17.0 - Added localalnfas=T/F : Whether to output local alignments to *.local.fas fasta file (if local=T) [False]
    # 2.17.1 - Fixed bug where query and hit lengths were not being output for fullblast.
    # 2.18.0 - Added blaste filtering to be applied to existing BLAST results.
    # 2.19.0 - Added maxall=X limits to all-by-all analyses. Added qassemble=T.
    # 2.19.1 - Fixed handling of basefile and results generation for blastres=FILE.
    # 2.19.2 - Modified output to be in rank order.
    # 2.20.0 - Added SNP Table output.
    # 2.21.0 - Added nocoverage Table output of regions missing from pairwise SNP Table.
    # 2.21.1 - Added fragrevcomp=T/F : Whether to reverse-complement DNA fragments that are on reverse strand to query [True]
    # 2.22.0 - Added description to HitSum table.
    # 2.22.1 - Added localaln=T/F to keep local alignment sequences in the BLAST local Table.
    # 2.22.2 - Fixed local output error. (Query/Qry issue - need to fix this and make consistent!)
    # 2.22.3 - Fixed blastv and blastb error: limit also applies to individual pairwise hits!
    # 2.23.0 - Divided GablamFrag and FragMerge.
    # 2.23.1 - Added tuplekeys=T to cmd_list as default. (Can still be over-ridden if it breaks things!)
    # 2.24.0 - Added localidmin and and localidcut as %identity versions of localmin and localcut. (Use for PAGSAT.)
    # 2.25.0 - Added localsAM=T/F : Save local (and unique) hits data as SAM files in addition to TDT [False]

GABLAM REST Output formats

There is currently no specific help available on REST output for this program. Run with &rest=help for general
options. Run with &rest=full to get full server output. Individual outputs can be identified/parsed:

###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~###
# OUTFMT:
...


&rest=OUTFMT can then be used to retrieve individual parts of the output in future.

© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.