SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

rje_genecards V0.4

RJE Genecards Parsing Module

Module: rje_genecards
Description: RJE Genecards Parsing Module
Version: 0.4
Last Edit: 28/03/08

Copyright © 2007 Richard J. Edwards - See source code for GNU License Notice


Imported modules: rje rje_zen


See SLiMSuite Blog for further documentation. See rje for general commands.

NOTE

This module has now been superceded somewhat by the rje_genemap module but is still used with rje_hprd to compile
links from HPRD. This module may also still be of use for smaller sets of genes that need to me mapped to HGNC, e.g.
manually compiled lists from experiments.

Function

This is a prototype module, which aims to take in a list of Gene Symbols and/or Aliases, find the relevant GeneCard entry, download it and extract the relevant protein gene/protein links into a table.

The ultimate goal is to generate a table pulling in identifiers from EnsEMBL, GeneCards and HPRD to allow easy cross-referencing across datasets and compilation of data from different sources. When using the altsource=LIST option, subsequent files will overwrite the data read from files earlier in the list. If update=T and the cardout file exists, this will be appended to the altsource list.

To save time, a full download of HGNC symbols can be downloaded from HGNC (http://www.genenames.org/index.html) and imported using the hgncdata=FILE option. This file should be delimited and contain the following fields (others are allowed): - HGNC ID, Approved Symbol, Approved Name, Previous Symbols, Aliases, Entrez Gene ID, RefSeq IDs, Entrez Gene ID (mapped data), OMIM, UniProt ID (mapped data), Ensembl ID (mapped data)

Commandline

### Input Options ###
genes=LIST : List of gene symbols/aliases to download []
update=T/F : Whether to read in any data from cardout file (if present) and add to it [True]
skiplist=LIST : Skip genes matching LIST (e.g. XP_*) []
useweb=T/F : Whether to try and extract missing data from GeneCards website [True]
altsource=LIST : List of alternative sources of data (Delimited files with appropriate headers) []
hgncdata=FILE : HGNC download file containing data []
### Output Options ###
species=X : Species to output in table [Human]
cardout=FILE : File for output of genecard data [genecards.tdt]
ensloci=FILE : File of EnsLoci genome to incorporate [/home/richard/Databases/EnsEMBL/ens_HUMAN.loci.fas]
restrict=T/F : Whether to only output lines for gene in the original gene=LIST [False]
purify=T/F : Only output lines where the Alias and the Symbol are the same [False]
### Special execution options ###
fullens=T/F : Incorporate all EnsLoci EnsEMBL genes into cardout file (long run!) [False]
fullhgnc=T/F : Output all HGNC codes and unambiguous aliases into file [False]

History Module Version History

    # 0.0 - Initial Compilation.
    # 0.1 - Added EnsLoci processing.
    # 0.2 - Added altsource and generally improved function and commenting.
    # 0.3 - Added more interactivity and options.
    # 0.4 - Added reading of HGNC download.

© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.