SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
Genomes
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

rje_genbank V1.5.5

RJE GenBank Module

Module: rje_genbank
Description: RJE GenBank Module
Version: 1.5.5
Last Edit: 06/06/19

Copyright © 2011 Richard J. Edwards - See source code for GNU License Notice


Imported modules: rje rje_db rje_obj rje_sequence rje_taxonomy rje_zen


See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This module is for parsing information out of GenBank files and converting them to other formats.

Input Options

seqin=FILE : Input Genbank file []
fetchuid=LIST : Genbank retrieval to of a list of nucleotide entries to generate seqin=FILE []
spcode=X : Overwrite species read from file (if any!) with X [None]
taxdir=PATH : Path to taxonomy files for species code extraction. (Will not use if blank or None) [./SourceData/]
addtags=T/F : Add locus_tag identifiers if missing - needed for gene/cds/prot fasta output [False]

Output Options

basefile=FILE : Root of output file names (same as input file by default) []
tabout=T/F : Delimited table output of features [False]
features=LIST : Subset of features to extract from Genbank file (blank for all) []
details=LIST : List of feature details to extract into own columns []
detailskip=LIST : Subset of feature details to exclude from extraction [translation]
fasout=LIST : Types of sequences to output into files (full/gene/cds/prot) as *.*.fas []
geneacc=X : Feature detail to use for gene sequence accession number (added to details) [locus_tag]
protacc=X : Feature detail to use for protein sequence accession number (added to details) [protein_id]
locusout=T/F : Whether to generate output by locus (True, locus as basefile) or combined (False) [False]
locusdir=PATH : Directory in which to generate output by locus [./]

See also rje.py generic commandline options.

History Module Version History

    # 0.0 - Initial Compilation.
    # 0.1 - Modified and Tidied output a little.
    # 0.2 - Added details to skip and option to use different detail for protein accession number.
    # 0.3 - Added reloading of features.
    # 1.0 - Basic functioning version. Added fetchuid=LIST Genbank retrieval to generate seqin=FILE.
    # 1.1 - Added use of rje_taxonomy for getting Species Code from TaxID.
    # 1.2 - Modified to deal with genbank protein entries.
    # 1.2.1 - Fixed feature bug that was breaking parser and removing trailing '*' from protein sequences.
    # 1.2.2 - Fixed more features that were breaking parser.
    # 1.3.0 - Added split viral output.
    # 1.3.1 - Fixed bug in split viral output.
    # 1.3.2 - Fixed bug in reverse complement sequences with introns.
    # 1.4.0 - Added addtags=T/F : Add locus_tag identifiers if missing - needed for gene/cds/prot fasta output [False]
    # 1.4.1 - Fixed genetic code warning.
    # 1.5.0 - Added setupRefGenome() method based on PAGSAT code.
    # 1.5.1 - Fixed logskip append locus sequence file bug.
    # 1.5.2 - Fixed addtag(s) bug.
    # 1.5.3 - Fixed https genbank download issue.
    # 1.5.4 - Added recognition of *.gbff for genbank files.
    # 1.5.5 - Fixed bug to use Locus for *.full.fas and thus link to Feature table properly.

© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.