Module:	rje_genomics
Description:	Genomics data reformatting module
Version:	0.9.0
Last Edit:	01/03/22

Imported modules: rje rje_db rje_obj rje_samtools rje_seqlist rje_blast_V2

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

The function of this module will be added here.

reformat: convert TDT file into GFF or SAM ncbi: combine NCBI accession numbers and annotation with local data gffmap: convert GFF files from one ID set to another samfilt: filter read alignments from SAM file mapbam: map a set of IDs from a BAM file and output an updated SAM file

Commandline

General Options

runmode=X : Run mode (reformat/ftgff/ncbi/makemap/gffmap/diphap/fqreads/fas2bed/ncbinr/gapgff/locgff/samfilt/mapbam) [reformat]
basefile=X : Base for output files, including log

Reformat Input/Output Options

tdtfile=FILE : Input delimited text file with data to convert [None]
tdtkeys=LIST : Input fields that define unique entries [Qry,Hit,AlnNum]
queryfield=X : Field defining the Query ("Read") name [Qry]
targetfield=X : Field defining the Target (Genome contig) name [Hit]
begfield=X : Field for beginning position [QryStart]
endfield=X : Field for end position [HitStart]
reformat=X : Output format (GFF3/SAM) [GFF3]

NCBI Annotation Options

seqin=FASFILE : Input assembly fasta file []
seqstyle=X : Sequence naming format for seqin=FASFILE sequences [dipnr]
ncbifas=FASFILE : NCBI assembly fasta file []
ncbigff=GFFFILE : NCBI annotation GFF file []

NCBINR Protein filtering Options

seqin=FASFILE : Input protein fasta file - accnum should match CDS feature Name []
ncbigff=GFFFILE : NCBI annotation GFF file (locus naming format not important) []

GFF Map Options

gffs=FILELIST : List of GFF files to convert - will be renamed BASEFILE.*.gff3 [*.gff,*.gff3]
mapping=FILE : File of old -> new ID mapping (e.g. from ncbi formatting) [mapping.csv]
seqin=FASFILE : Input assembly fasta file (makemap mode) []
mapfas=FASFILE : Alternative fasta file for GFF ID mapping (makemap mode) []

FASTQ Options

fqfiles=FILELIST: List of fastq files (may be gzipped) to process [*.fq,*.fq.gz,*.fastq,*.fastq.gz]

GAP GFF Options

seqin=FASFILE : Input assembly fasta file []
mingap=INT : Minimum gap length to annotation [10]

SAM Filter Options

sam=FILE : Input SAM file to filter []
minmaplen=INT : Minimum number of matching template positions to keep SAM hit [0]
outsam=FILE : Output SAM file [$BASEFILE.filtered.sam]

BAM Mapping Options

bam=FILE : Input SAM file to filter []
mapping=FILE : File of seqname, newseqname, start, end, shift
outsam=FILE : Output SAM file [$BASEFILE.mapped.sam]

History Module Version History

    # 0.0.0 - Initial Compilation.
    # 0.1.0 - Added ncbi annotation reformatting.
    # 0.2.0 - Added diphap renaming of pseduodiploid assembly sequences.
    # 0.3.0 - Added fqreads counting of reads from fastq.gz.
    # 0.4.0 - Added GFFMap function for mapping IDs onto others in GFF files. And mapfas mode to make mapping table.
    # 0.5.0 - Added Fas2Bed function for making a BED file of all contigs for bedtools coverage etc.
    # 0.6.0 - Added GapGFF option to generate GFF of assembly gaps.
    # 0.6.1 - Fixed TDTKeys bug.
    # 0.7.0 - Added loc2gff mode for converting local hits table to GFF3 output.
    # 0.8.0 - Added samfilt: filter read alignments from SAM file
    # 0.8.1 - Fixed gapgff bug that had first gap in header.
    # 0.9.0 - Added mapbam: map a set of IDs from a BAM file and output an updated SAM file

rje_genomics REST Output formats

Run with &rest=help for general options. Run with &rest=full to get full server output as text or &rest=format
for more user-friendly formatted output. Individual outputs can be identified/parsed using &rest=OUTFMT.

SLiMSuite REST Server

rje_genomics V0.9.0

Genomics data reformatting module