SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
Genomes
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

rje_gff V0.2.1

GFF File Parser and Manipulator

Module: rje_gff
Description: GFF File Parser and Manipulator
Version: 0.2.1
Last Edit: 20/11/20
Webserver: http://www.slimsuite.unsw.edu.au/servers/gff.php

Copyright © 2018 Richard J. Edwards - See source code for GNU License Notice


Imported modules: rje rje_obj rje_db rje_seqlist rje_sequence


See SLiMSuite Blog for further documentation. See rje for general commands.

Function

The GFF file given by gffin=FILE will be parsed and the components optionally output to tables, a text file of comment lines (starting #) and fasta format sequences if given. The GFF filename sets the output prefix, which can be over-ridden with basefile=FILE.

The default fields parsed from the GFF are: locus, source, feature, start, end, score, strand, phase, attributes. Additional fields can be extracted from the attributes field, using attributes=LIST. Setting attributes="*" or attributes=all will extract all attributes into additional fields. Note that the attributes field itself will be kept unless attfield=F is used to remove it.

integrity=T will perform checks that the features do not go outside the range of the parsed sequence-region and/or fasta sequences.

indelwarn=T and stopwarn=T will identify adjacent CDS features that may have sequencing and/or translation errors. indelwarn=T looks for adjacent CDS features with the same (or hyplist=LIST) "product" (warnfield=X) annotation that are within 3 nt of each other (generally overlapping) and might thus represent a fragmented ORF due to a frameshift error. stopwarn=T identifies similar features that have exactly one codon between them, which could represent an atypical genetic code being mis-translated as a stop codon.

joinseq=T will output joined sequences to *.joined.gff and, if sequences are parsed, *.joined.aa.fas and *.joined.nt.fas. For protein sequence translations, stopwarn sequences are joined with a *. indelwarn sequences are joined with flanking and internal xx pairs that delineate the overlapping parts of each annotated protein sequence.

NOTE: Only GFF3 is currently supported.

Commandline

Input/Output Options

gffin=FILE : Input GFF file to parse [None]
seqin=FILE : Optional fasta file of reference sequences [None]
gfftab=T/F : Whether to output parsed GFF file as a delimited table with headers [True]
gffloci=T/F : Whether to parse sequence-region GFF comments to *.loci.tdt [True]
gffcomment=T/F : Whether to output parsed GFF comments to *.comments.txt [False]
gfffasta=T/F : Whether to output parsed GFF sequences to *.fasta [False]
attributes=LIST : List of attributes (X=Y;) to pull out into own fields ("*" or "all" for all) [*]
attfield=T/F : Whether to keep the full attribute field as parsed from the GFF file [False]
gffout=FILE : Save updated GFF format to FILE [None]
gffseq=T/F : Whether to include sequences in updated GFF file [False]

GFF Processing Options

integrity=T/F : Perform GFF integrity check based on parsed sequence-region comments and/or fasta [True]
indelwarn=T/F : Perform check for possible indels based on overlapping/close common features [True]
hypindel=INT : Number of hypothetical proteins that can be involved in a possible indel (0-2) [1]
stopwarn=T/F : Perform check for possible codon table stop codon errors based on close common features [True]
warnfield=X : Attribute field to use for generating indel or stop codon warnings [product]
idfield=X : Attribute field to use for CDS gene ID [ID]
hyplist=LIST : List of warnfield values to identify as hypothetical protein ['hypothetical protein']
cdsfeatures=LIST: List of feature types to count as CDS for warning checks [CDS]
joinseq=T/F : Whether to join sequences possible affected by stop codons or frameshifts [False]


History Module Version History

    # 0.0.0 - Initial Compilation.
    # 0.1.0 - Basic functional version.
    # 0.1.1 - Modified for splice isoform handling
    # 0.1.2 - Fixed parsing of GFFs with sequence-region information interspersed with features.
    # 0.1.3 - Added option to parseGFF to switch off the attribute parsing.
    # 0.2.0 - Added gff output with ability to fix GFF of tab delimit errors
    # 0.2.1 - Added restricted feature parsing from GFF.

This server is still in development. Please report any odd/unwanted behaviour.

Run

Upload GFF file and set options below, then click:

After running, click on the features tab to see the main table of GFF features.

GFF Input Options:

GFF file upload:

Optional reference fasta file upload:

Output Options

Features table | Loci (sequence-region) table | GFF comments | GFF sequence FASTA
Join sequences possibly affected by stop codons or frameshifts

List of attributes (X=Y;) to pull out into own fields ("*" or "all" for all):

Whether to keep the full attribute field as parsed from the GFF file

Processing Options

Perform GFF integrity check based on parsed sequence-region comments and/or fasta
Perform check for possible indels based on overlapping/close common features
Number of hypothetical proteins that can be involved in a possible indel (0-2):
Perform check for possible codon table stop codon errors based on close common features

Attribute field to use for generating indel or stop codon warnings:

Attribute field to use for CDS gene ID:

List of warnfield values to identify as hypothetical protein:

List of feature types to count as CDS for warning checks:

Advanced Options

Other options:


© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.