SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

FIESTA V1.9.0

Fasta Input EST Analysis

Program: FIESTA
Description: Fasta Input EST Analysis
Version: 1.9.0
Last Edit: 26/11/14
Citation: Jones, Edwards et al. (2011), Marine Biotechnology 13(3): 496-504.

Copyright © 2008 Richard J. Edwards - See source code for GNU License Notice


Imported modules: multihaq rje rje_seq rje_sequence rje_tree rje_zen rje_blast_V2


See SLiMSuite Blog for further documentation.

Function

FIESTA has three primary functions: <1> Discovery, assembly and evolutionary analysis of candidate genes in an EST library; <2> Assembly of an EST library for proteomics analysis; <3> Translation/Annotation of an EST library for proteomics analysis. These functions are outlined below.

Candidate Gene Discovery

After optional pre-assembly of the EST library using the DNA FIESTA pipeline (see below), the candidate protein
dataset (the QueryDB) is used for translation and annotation (see below) of those ESTs with BLAST homology to 1+
candidates. These translations are then assembled into consensus sequences where appropriate and alignments and
trees made for the hits to each candidate protein. Finally, if desired, HAQESAC is run for each candidate protein.

EST Assembly

All EST assembly experiences two trade-offs: one between speed and accuracy, and a second between redundancy and
accuracy. In particular, distinguishing sequencing errors from sequence variants (alleles) from different gene family
members is not trivial.

FIESTA is designed to provide straightforward assembly and BLAST-based annotation of EST sequences in Fasta format.
The rationale behind its design was to try and optimise quality & redundancy versus comprehensive coverage for
Proteomics identifications from Mass Spec data. FIESTA is designed to function in a relatively standalone capacity,
with BLAST being the only other tool necessary. Due to this simplicity, FIESTA has some limitations; the main one
being its inability to identify and deal with frameshift (indel) sequencing errors.

FIESTA has two assembly and annotation pipelines: a protein pipeline based loosely on BUDAPEST and a DNA pipeline for
"true" EST assembly. Details can be found in the Manual.

EST library assembly/annotation

In addition to the main functions, parts of the main FIESTA assembly/annotation pipeline can be run as standalone
functions.

ESTs can be converted to Reading Frames (RF) with est2rf=T:
1. Identify orientation using 5' poly-T or 3' poly-A.
- 1a. Where poly-AT tail exists, remove, translate in 3 forward RF and truncate at terminal stop codon.
- 1b. Where no poly-AT tail exists, translate in all six RF.
2. BLAST translations vs. search database with complexity filter on.
- 2a. If EST has BLAST hits, retain RFs with desired e-value or better.
- 2b. If no BLAST hits, retain all RFs.

Alternatively, translated RFs or other unannotated protein sequences can be given crude BLAST-based annotations using
searchdb=FILE sequences with blastann=T. Note that these are simply the top BLAST hit and better annotation would be
achieved using HAQESAC (or MultiHAQ for many sequences).

Commandline

GENERAL INPUT

seqin=FILE : EST file to be processed [None]
fwdonly=T/F : Whether to treat EST/cDNA sequences as coding strands (False = search all 6RF) [False]
minpolyat=X : Min length of poly-AT to be considered a poly AT [10]
minorf=X : Min length of ORFs to be considered [20]
blastopt=FILE : File containing additional BLAST options for run, e.g. -B F [None]
ntrim=X : Trims of regions >= X proportion N bases [0.5]

SEQUENCE FORMATTING

gnspacc=T/F : Convert sequences into gene_SPECIES__AccNum format wherever possible. [False]
spcode=X : Species code for EST sequences [None]
species=X : Species for EST sequences [None]
newacc=X : New base for sequence accession numbers ['' or spcode]

EST ASSEMBLY

minaln=X : Min length of shared region for consensus assembly [40]
minid=X : Min identity of shared region for consensus assembly [95.0]
bestorf=T/F : Whether to use the "Best" ORF only for ESTs without BLAST Hits [True]
pickup=T/F : Whether to read in partial results and skip those sequences [True]
annotate=T/F : Annotate consensus sequences using BLAST-based approach [False]
dna=T/F : Implement DNA-based GABLAM assembly [True]
resave=X : Number of ESTs to remove before each resave of GABLAM searchdb [200]
gapblast=T/F : Whether to allow gaps during BLAST identification of GABLAM homologues [False]
assmode=X : Mode to use for EST assembly (nogab,gablam,oneqry) [oneqry]
gabrev=T/F : Whether to use GABLAM-based reverse complementation [True]

ANNOTATION

est2rf=T/F : Execute BLAST-based EST to RF translation/annotation only, on seqin [False]
est2haq=T/F : Execute BLAST-based EST to RF translation/annotation on seqin followed by HAQESAC analysis [False]
blastann=T/F : Execute BLAST-based annotation of conensus translations only, on seqin [False]
truncnt=T/F : Whether to truncate N-terminal to Met in final BLAST annotation (if hit) [False]
searchdb=FILE : Fasta file for GABLAM search of EST translations [None]

QUERY SEARCH

batch=LIST : List of EST libraries to search (will use seqin if none given) []
querydb=FILE : File of query sequences to search for in EST library [None]
qtype=X : Sequence "Type" to be used with NewAcc for annotation of translations [hit]
assembly=T/F : Assemble EST sequences prior to search [False]
consensi=T/F : Assemble hit ORF into consensus sequences [False]

HAQESAC OPTIONS

haqesac=T/F : HAQESAC analysis of identified EST translations [True]
multihaq=T/F : Whether to run HAQESAC in two-phases [True]
blastcut=X : Reduced the number of sequences in HAQESAC runs to X (0 = no reduction) [50]
cleanhaq=T/F : Delete excessive HAQESAC results files [True]
haqdb=FILELIST : Optional extra databases to search for HAQESAC analysis []
haqbatch=T/F : Whether to only generate HAQESAC batch file (True) or perform whole run (False) [False]

History Module Version History

    # 0.0 - Initial Compilation.
    # 0.1 - Added FIESTA pipeline for protein-based clustering in addition to TIGR based (partial) method
    # 0.2 - Removed TIGR pipeline and replaced with DNA version of FIESTA.
    # 0.3 - Added annotation method and extra mapping.
    # 0.4 - Added oneqry method for GABLAM consensus generation
    # 1.0 - Added querydb search option
    # 1.1 - Added assmode option = Mode to use for EST assembly (nogab,gablam,oneqry) [oneqry]
    # 1.2 - Added FwdOnly option for EST annotation.
    # 1.3 - Add HAQESAC run following annotation.
    # 1.4 - Tidied and modified QueryESTs analysis.
    # 1.5 - Bug removal and additional tidying for MultiHAQ and annotateEST methods.
    # 1.6 - Removed HAQESAC import (uses MultiHAQ).
    # 1.7 - Updated to use rje_blast_V2. Needs work to make function with BLAST+.
    # 1.8 - Minor crash fixes. Updated more functions to work with BLAST+.
    # 1.8.1 - Replaced type with stype throughout to try and avoid TypeError crashes.
    # 1.9.0 - Altered HAQDB to be a list of files rather than just one.

FIESTA REST Output formats

Run with &rest=docs for program documentation and options. A plain text version is accessed with &rest=help.
&rest=OUTFMT can be used to retrieve individual parts of the output, matching the tabs in the default
(&rest=format) output. Individual OUTFMT elements can also be parsed from the full (&rest=full) server output,
which is formatted as follows:
###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~###
# OUTFMT:
... contents for OUTFMT section ...

Available REST Outputs

There is currently no specific help available on REST output for this program.

© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.