The GFF file given by
gffin=FILE will be parsed and the components optionally output to tables, a text file of
comment lines (starting
#) and fasta format sequences if given. The GFF filename sets the output prefix,
which can be over-ridden with
The default fields parsed from the GFF are:
locus, source, feature, start, end, score, strand, phase, attributes.
Additional fields can be extracted from the attributes field, using
attributes=all will extract all attributes into additional fields. Note that the
attributes field itself will be
attfield=F is used to remove it.
integrity=T will perform checks that the features do not go outside the range of the parsed sequence-region and/or
stopwarn=T will identify adjacent CDS features that may have sequencing and/or translation
indelwarn=T looks for adjacent CDS features with the same (or
hyplist=LIST) "product" (
annotation that are within 3 nt of each other (generally overlapping) and might thus represent a fragmented ORF due
to a frameshift error.
stopwarn=T identifies similar features that have exactly one codon between them, which
could represent an atypical genetic code being mis-translated as a stop codon.
joinseq=T will output joined sequences to
*.joined.gff and, if sequences are parsed,
*.joined.nt.fas. For protein sequence translations,
stopwarn sequences are joined with a
sequences are joined with flanking and internal
xx pairs that delineate the overlapping parts of each
annotated protein sequence.
NOTE: Only GFF3 is currently supported.
gffin=FILE : Input GFF file to parse [
seqin=FILE : Optional fasta file of reference sequences [
gfftab=T/F : Whether to output parsed GFF file as a delimited table with headers [
gffloci=T/F : Whether to parse sequence-region GFF comments to
gffcomment=T/F : Whether to output parsed GFF comments to
gfffasta=T/F : Whether to output parsed GFF sequences to
attributes=LIST : List of attributes (
X=Y;) to pull out into own fields ("*" or "all" for all) [
attfield=T/F : Whether to keep the full attribute field as parsed from the GFF file [
gffout=FILE : Save updated GFF format to FILE [
gffseq=T/F : Whether to include sequences in updated GFF file [
GFF Processing Options
integrity=T/F : Perform GFF integrity check based on parsed sequence-region comments and/or fasta [
indelwarn=T/F : Perform check for possible indels based on overlapping/close common features [
hypindel=INT : Number of hypothetical proteins that can be involved in a possible indel (0-2) [
stopwarn=T/F : Perform check for possible codon table stop codon errors based on close common features [
warnfield=X : Attribute field to use for generating indel or stop codon warnings [
idfield=X : Attribute field to use for CDS gene ID [
hyplist=LIST : List of warnfield values to identify as hypothetical protein [
cdsfeatures=LIST: List of feature types to count as CDS for warning checks [
joinseq=T/F : Whether to join sequences possible affected by stop codons or frameshifts [
History Module Version History
# 0.0.0 - Initial Compilation.
# 0.1.0 - Basic functional version.
# 0.1.1 - Modified for splice isoform handling
# 0.1.2 - Fixed parsing of GFFs with sequence-region information interspersed with features.
# 0.1.3 - Added option to parseGFF to switch off the attribute parsing.
# 0.2.0 - Added gff output with ability to fix GFF of tab delimit errors
# 0.2.1 - Added restricted feature parsing from GFF.
This server is still in development. Please report any odd/unwanted behaviour.