Function
FIESTA has three primary functions: <1> Discovery, assembly and evolutionary analysis of candidate genes in an EST
library; <2> Assembly of an EST library for proteomics analysis; <3> Translation/Annotation of an EST library for
proteomics analysis. These functions are outlined below.
Candidate Gene Discovery
After optional pre-assembly of the EST library using the DNA FIESTA pipeline (see below), the candidate protein
dataset (the QueryDB) is used for translation and annotation (see below) of those ESTs with BLAST homology to 1+
candidates. These translations are then assembled into consensus sequences where appropriate and alignments and
trees made for the hits to each candidate protein. Finally, if desired, HAQESAC is run for each candidate protein.
EST Assembly
All EST assembly experiences two trade-offs: one between speed and accuracy, and a second between redundancy and
accuracy. In particular, distinguishing sequencing errors from sequence variants (alleles) from different gene family
members is not trivial.
FIESTA is designed to provide straightforward assembly and BLAST-based annotation of EST sequences in Fasta format.
The rationale behind its design was to try and optimise quality & redundancy versus comprehensive coverage for
Proteomics identifications from Mass Spec data. FIESTA is designed to function in a relatively standalone capacity,
with BLAST being the only other tool necessary. Due to this simplicity, FIESTA has some limitations; the main one
being its inability to identify and deal with frameshift (indel) sequencing errors.
FIESTA has two assembly and annotation pipelines: a protein pipeline based loosely on BUDAPEST and a DNA pipeline for
"true" EST assembly. Details can be found in the Manual.
EST library assembly/annotation
In addition to the main functions, parts of the main FIESTA assembly/annotation pipeline can be run as standalone
functions.
ESTs can be converted to Reading Frames (RF) with est2rf=T
:
1. Identify orientation using 5' poly-T or 3' poly-A.
- 1a. Where poly-AT tail exists, remove, translate in 3 forward RF and truncate at terminal stop codon.
- 1b. Where no poly-AT tail exists, translate in all six RF.
2. BLAST translations vs. search database with complexity filter on.
- 2a. If EST has BLAST hits, retain RFs with desired e-value or better.
- 2b. If no BLAST hits, retain all RFs.
Alternatively, translated RFs or other unannotated protein sequences can be given crude BLAST-based annotations using
searchdb=FILE
sequences with blastann=T
. Note that these are simply the top BLAST hit and better annotation would be
achieved using HAQESAC (or MultiHAQ for many sequences).
Commandline
GENERAL INPUT
seqin=FILE
: EST file to be processed [None
]
fwdonly=T/F
: Whether to treat EST/cDNA sequences as coding strands (False = search all 6RF) [False
]
minpolyat=X
: Min length of poly-AT to be considered a poly AT [10
]
minorf=X
: Min length of ORFs to be considered [20]
blastopt=FILE
: File containing additional BLAST options for run, e.g. -B F [None
]
ntrim=X
: Trims of regions >= X proportion N bases [0.5
]
SEQUENCE FORMATTING
gnspacc=T/F
: Convert sequences into gene_SPECIES__AccNum format wherever possible. [False
]
spcode=X
: Species code for EST sequences [None
]
species=X
: Species for EST sequences [None
]
newacc=X
: New base for sequence accession numbers ['' or spcode
]
EST ASSEMBLY
minaln=X
: Min length of shared region for consensus assembly [40
]
minid=X
: Min identity of shared region for consensus assembly [95.0
]
bestorf=T/F
: Whether to use the "Best" ORF only for ESTs without BLAST Hits [True
]
pickup=T/F
: Whether to read in partial results and skip those sequences [True
]
annotate=T/F
: Annotate consensus sequences using BLAST-based approach [False
]
dna=T/F
: Implement DNA-based GABLAM assembly [True
]
resave=X
: Number of ESTs to remove before each resave of GABLAM searchdb [200
]
gapblast=T/F
: Whether to allow gaps during BLAST identification of GABLAM homologues [False
]
assmode=X
: Mode to use for EST assembly (nogab,gablam,oneqry) [oneqry
]
gabrev=T/F
: Whether to use GABLAM-based reverse complementation [True
]
ANNOTATION
est2rf=T/F
: Execute BLAST-based EST to RF translation/annotation only, on seqin [False
]
est2haq=T/F
: Execute BLAST-based EST to RF translation/annotation on seqin followed by HAQESAC analysis [False
]
blastann=T/F
: Execute BLAST-based annotation of conensus translations only, on seqin [False
]
truncnt=T/F
: Whether to truncate N-terminal to Met in final BLAST annotation (if hit) [False
]
searchdb=FILE
: Fasta file for GABLAM search of EST translations [None
]
QUERY SEARCH
batch=LIST
: List of EST libraries to search (will use seqin if none given) []
querydb=FILE
: File of query sequences to search for in EST library [None
]
qtype=X
: Sequence "Type" to be used with NewAcc for annotation of translations [hit
]
assembly=T/F
: Assemble EST sequences prior to search [False
]
consensi=T/F
: Assemble hit ORF into consensus sequences [False
]
HAQESAC OPTIONS
haqesac=T/F
: HAQESAC analysis of identified EST translations [True
]
multihaq=T/F
: Whether to run HAQESAC in two-phases [True
]
blastcut=X
: Reduced the number of sequences in HAQESAC runs to X (0 = no reduction) [50
]
cleanhaq=T/F
: Delete excessive HAQESAC results files [True
]
haqdb=FILELIST
: Optional extra databases to search for HAQESAC analysis []
haqbatch=T/F
: Whether to only generate HAQESAC batch file (True) or perform whole run (False) [False
]
History Module Version History
# 0.0 - Initial Compilation.
# 0.1 - Added FIESTA pipeline for protein-based clustering in addition to TIGR based (partial) method
# 0.2 - Removed TIGR pipeline and replaced with DNA version of FIESTA.
# 0.3 - Added annotation method and extra mapping.
# 0.4 - Added oneqry method for GABLAM consensus generation
# 1.0 - Added querydb search option
# 1.1 - Added assmode option = Mode to use for EST assembly (nogab,gablam,oneqry) [oneqry]
# 1.2 - Added FwdOnly option for EST annotation.
# 1.3 - Add HAQESAC run following annotation.
# 1.4 - Tidied and modified QueryESTs analysis.
# 1.5 - Bug removal and additional tidying for MultiHAQ and annotateEST methods.
# 1.6 - Removed HAQESAC import (uses MultiHAQ).
# 1.7 - Updated to use rje_blast_V2. Needs work to make function with BLAST+.
# 1.8 - Minor crash fixes. Updated more functions to work with BLAST+.
# 1.8.1 - Replaced type with stype throughout to try and avoid TypeError crashes.
# 1.9.0 - Altered HAQDB to be a list of files rather than just one.
FIESTA REST Output formats
Run with
&rest=docs
for program documentation and options. A plain text version is accessed with
&rest=help
.
&rest=OUTFMT
can be used to retrieve individual parts of the output, matching the tabs in the default
(
&rest=format
) output. Individual
OUTFMT
elements can also be parsed from the full (
&rest=full
) server output,
which is formatted as follows:
###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~###
# OUTFMT:
... contents for OUTFMT section ...
Available REST Outputs
There is currently no specific help available on REST output for this program.