Genome-Free EST SuperSAGE Analysis
Copyright © 2011 Richard J. Edwards - See source code for GNU License Notice
This program is for the automated processing, mapping and identification-by-homology for SuperSAGE tag data for organisms without genome sequences, relying predominantly on EST libraries etc. Although designed for genome-free analysis, there is no reason why transcriptome data from genome projects cannot be used in the pipeline.
GFESSA aims to take care of the following main issues: 1. Removal of unreliable tag identification/quantification based on limited count numbers. 2. Converting raw count values into enrichment in one condition versus another. 3. Calculating mean quantification for genes based on all the tags mapping to the same sequence. 4. The redundancy of EST libraries, by mapping tags to multiple sequences where necessary and clustering sequences on shared tags.
The final output is a list of the sequences identified by the SAGE experiment along with enrichment data and clustering based on shared tags.
See also rje.py generic commandline options.
History Module Version History
# 0.0 - Initial Compilation using exact matches only. # 0.1 - BLAST-based inexact search method. # 0.2 - Removed sequence annotation and clustering. Added extra enrichment clustering. # 1.0 - Updated to fix basefile issue and improve documentation, including manual. Add mean cluster enrichment. # 1.1 - Added minabstag and minexptag to give more control over low abundance tag filtering # 1.2 - Added longtdt to output "Long" format file needed for R analysis. # 1.3 - Tidied module imports. # 1.4 - Switched to rje_blast_V2. More work needed for BLAST+.
© 2015 RJ Edwards. Contact: firstname.lastname@example.org.