Bioinformatics Utility for Data Analysis of Proteomics on ESTs
Copyright © 2008 Richard J. Edwards - See source code for GNU License Notice
Proteomic analysis of EST data presents a bioinformatics challenge that is absent from standard protein-sequence based identification. EST sequences are translated in all six Reading Frames (RF), most of which will not be biologically relevant. In addition to increasing the search space for the MS search engines, there is also the added challenge of removing redundancy from results (due to the inherent redundancy of the EST database), removing spurious identifications (due to the translation of incorrect reading frames), and identifying the true protein hits through homology to known proteins.
BUDAPEST (Bioinformatics Utility for Data Analysis of Proteomics on ESTs) aims to overcome some of these problems by post-processing results to remove redundancy and assign putative homology-based identifications to translated RFs that have been "hit" during a MASCOT search of MS data against an EST database. Peptides assigned to "incorrect" RFs are eliminated and EST translations combined in consensus sequences using FIESTA (Fasta Input EST Analysis). These consensus hits are optionally filtered on the number of MASCOT peptides they contain before being re-annotated using BLAST searches against a reference database. Finally, HAQESAC can be used for automated or semi-automated phylogenetic analysis for improved sequence annotation.
BUDAPEST takes three main files as input:
BUDAPEST produces the following main output files, where X is set by
Additional information can also be obtained from the additional sequence files:
Lastly, reformatted MASCOT files are produced, named after the original input file (Y):
History Module Version History
# 0.0 - Initial Compilation. # 0.1 - Reworked the pipeline in the light of discoveries made from version 0.0 runs. # 1.0 - Working version for basic analysis. # 1.1 - Modified to work with new MASCOT column headers. # 1.2 - Added tracking of MASCOT data, results tables and division of EST-RFs. # 1.3 - Split clustering into two levels: peptide and sequence clustering # 1.4 - Added FIESTA auto-construction of consensi from BUDAPEST RF translations [True] # 1.5 - Added MinPep filtering. # 1.6 - Improved tracking of peptides to final consensus sequences and output details. # 1.7 - Added menu and extra control of interactivity. Removed rfhits=F option. # 1.8 - Added preliminary iTRAQ handling. # 1.9 - Bug fixed for new MASCOT output. # 2.0 - Revised version using rje_mascot object for loading. # 2.1 - Improved handling of iTRAQ data using rje_mascot V1.2. # 2.2 - Removed unrequired rje_dismatrix import. # 2.3 - Updated to use rje_blast_V2. Needs further updates for BLAST+. Deleted obsolete OLDreadMascot() method.
© 2015 RJ Edwards. Contact: firstname.lastname@example.org.