Program:	GFESSA
Description:	Genome-Free EST SuperSAGE Analysis
Version:	1.4
Last Edit:	20/08/13
Citation:	Johansson SA et al (2020). Algal Research 48:101917

Imported modules: budapest rje_seq rje_seqlist rje_sequence rje_blast_V2 rje rje_db rje_zen

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This program is for the automated processing, mapping and identification-by-homology for SuperSAGE tag data for organisms without genome sequences, relying predominantly on EST libraries etc. Although designed for genome-free analysis, there is no reason why transcriptome data from genome projects cannot be used in the pipeline.

GFESSA aims to take care of the following main issues: 1. Removal of unreliable tag identification/quantification based on limited count numbers. 2. Converting raw count values into enrichment in one condition versus another. 3. Calculating mean quantification for genes based on all the tags mapping to the same sequence. 4. The redundancy of EST libraries, by mapping tags to multiple sequences where necessary and clustering sequences on shared tags.

The final output is a list of the sequences identified by the SAGE experiment along with enrichment data and clustering based on shared tags.

Commandline

INPUT OPTIONS

tagfile=FILE tagfield=X expconvert=FILE experiments=LIST: seqin=FILE tagindex=FILE tagmap=FILE tagstart=X taglen=X

PROCESS OPTIONS

mintag=X minabstag=X minexptag=X allreptag=X minenrtag=X enrcut=X pwenr=X expand=T/F mismatch=X bestmatch=T/F normalise=X

OUTPUT OPTIONS

basefile=X longtdt=T/F : File containing SuperSAGE tags and counts [None]
: Field in tagfile containing tag sequence. (All others should be counts) ['Tag sequence']
: File containing 'Header', 'Experiment' conversion data [None]
List of (converted) experiment names to use []
: File containing EST/cDNA data to search for tags within [None]
: File containing possible tags and sequence names from seqin file [*.tag.index]
: Tag to sequence mapping file to over-ride auto-generated file based on Seqin and Mismatch [None]
: Sequence starting tags ['CATG']
: Length of sequence tags [26]
: Minimum total number of counts for a tag to be included (summed replicates) [0]
: Minimum individual number of counts for a tag to be included (ANY one replicate) [5]
: Minimum number of experiments for a tag to be included (no. replicates) [3]
: Filter out any Tags that are not returned by ALL replicates of X experiments [0]
: Minimum number of counts for a tag to be retained for enrichment etc. (summed replicates) [15]
: Minimum mean fold change between experiments [2.5]
: Minimum fold change between pairwise experiment comparisons [1.0]
: Whether to expand from enriched TAGs to unenriched TAGs through shared sequence hits [True]
: No. mismatches to allow. -1 = Exact matching w/o BLAST [-1]
: Whether to stop looking for more mismatches once hits of a given stringency found [True]
: Method for normalising tag counts within replicate (None/ppm) [ppm]
: "Base" name for all results files, e.g. X.gfessa.tdt [TAG file basename]
: Whether to output "Long" format file needed for R analysis [True]

See also rje.py generic commandline options.

History Module Version History

    # 0.0 - Initial Compilation using exact matches only.
    # 0.1 - BLAST-based inexact search method.
    # 0.2 - Removed sequence annotation and clustering. Added extra enrichment clustering.
    # 1.0 - Updated to fix basefile issue and improve documentation, including manual. Add mean cluster enrichment.
    # 1.1 - Added  minabstag and minexptag to give more control over low abundance tag filtering
    # 1.2 - Added longtdt to output "Long" format file needed for R analysis.
    # 1.3 - Tidied module imports.
    # 1.4 - Switched to rje_blast_V2. More work needed for BLAST+.

SLiMSuite REST Server