Module:	rje_genbank
Description:	RJE GenBank Module
Version:	1.5.5
Last Edit:	06/06/19

Imported modules: rje rje_db rje_obj rje_sequence rje_taxonomy rje_zen

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This module is for parsing information out of GenBank files and converting them to other formats.

Input Options

seqin=FILE : Input Genbank file []
fetchuid=LIST : Genbank retrieval to of a list of nucleotide entries to generate seqin=FILE []
spcode=X : Overwrite species read from file (if any!) with X [None]
taxdir=PATH : Path to taxonomy files for species code extraction. (Will not use if blank or None) [./SourceData/]
addtags=T/F : Add locus_tag identifiers if missing - needed for gene/cds/prot fasta output [False]

Output Options

basefile=FILE : Root of output file names (same as input file by default) []
tabout=T/F : Delimited table output of features [False]
features=LIST : Subset of features to extract from Genbank file (blank for all) []
details=LIST : List of feature details to extract into own columns []
detailskip=LIST : Subset of feature details to exclude from extraction [translation]
fasout=LIST : Types of sequences to output into files (full/gene/cds/prot) as *.*.fas []
geneacc=X : Feature detail to use for gene sequence accession number (added to details) [locus_tag]
protacc=X : Feature detail to use for protein sequence accession number (added to details) [protein_id]
locusout=T/F : Whether to generate output by locus (True, locus as basefile) or combined (False) [False]
locusdir=PATH : Directory in which to generate output by locus [./]

See also rje.py generic commandline options.

History Module Version History

    # 0.0 - Initial Compilation.
    # 0.1 - Modified and Tidied output a little.
    # 0.2 - Added details to skip and option to use different detail for protein accession number.
    # 0.3 - Added reloading of features.
    # 1.0 - Basic functioning version. Added fetchuid=LIST Genbank retrieval to generate seqin=FILE.
    # 1.1 - Added use of rje_taxonomy for getting Species Code from TaxID.
    # 1.2 - Modified to deal with genbank protein entries.
    # 1.2.1 - Fixed feature bug that was breaking parser and removing trailing '*' from protein sequences.
    # 1.2.2 - Fixed more features that were breaking parser.
    # 1.3.0 - Added split viral output.
    # 1.3.1 - Fixed bug in split viral output.
    # 1.3.2 - Fixed bug in reverse complement sequences with introns.
    # 1.4.0 - Added addtags=T/F : Add locus_tag identifiers if missing - needed for gene/cds/prot fasta output [False]
    # 1.4.1 - Fixed genetic code warning.
    # 1.5.0 - Added setupRefGenome() method based on PAGSAT code.
    # 1.5.1 - Fixed logskip append locus sequence file bug.
    # 1.5.2 - Fixed addtag(s) bug.
    # 1.5.3 - Fixed https genbank download issue.
    # 1.5.4 - Added recognition of *.gbff for genbank files.
    # 1.5.5 - Fixed bug to use Locus for *.full.fas and thus link to Feature table properly.

rje_genbank REST Output formats

Run with &rest=docs for program documentation and options. A plain text version is accessed with &rest=help.
&rest=OUTFMT can be used to retrieve individual parts of the output, matching the tabs in the default
(&rest=format) output. Individual OUTFMT elements can also be parsed from the full (&rest=full) server output,
which is formatted as follows:

###~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~###
# OUTFMT:
... contents for OUTFMT section ...

Available REST Outputs

There is currently no specific help available on REST output for this program.

SLiMSuite REST Server