SLiMSuite REST Server

EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
REST Pages
REST Status
REST Tools
REST Alias Data
REST Sitemap

rje_hprd V1.2.1

HPRD Database processing module

Module: rje_hprd
Description: HPRD Database processing module
Version: 1.2.1
Last Edit: 16/05/16

Copyright © 2007 Richard J. Edwards - See source code for GNU License Notice

Imported modules: rje rje_genecards rje_seq rje_dismatrix_V2

See SLiMSuite Blog for further documentation. See rje for general commands.


This module is designed for specific PPI Database manipulations:

1. Parsing HPRD Flat Files. [This is the default and is run if no other option is selected.]
Upon downloading the HPRD FLAT_FILES, this module can be run to parse out binary protein interactions and make sequence-specific interaction datasets. Currently, it is not clear how (if at all) the "isoform" data is used in HPRD for distinguishing interactions, so all isoform_1 sequences will be used for Fasta datasets. Sequences will be reformatted into:

>Gene_HUMAN__AccNum Description [Gene:WWWW HPRD:XXXX; gb:YYYY; sp:ZZZZ]
The Gene will be the HUGO gene name (as parsed from HPRD) where available, else it will be the HPRD ID. This will be unique for each protein and will correspond to a dataset of the same name: HPRD_Datasets/Gene_hprd.fas. All proteins will also be saved in a file hprd.fas. The AccNum will be UniProt if possible, else GenBank. If the option alliso=T is used, then all isoforms will be included and the AccNum will be X-Y where X is the HPRD ID and Y is the isoform.

2. Converting a table of interactions into a distance matrix. This table should be a plain text file in which the first column is the interacting protein name and the subsequent columns are for the proteins (hubs) to be clustered. The first row contains their name. The rows for each spoke protein should be empty (or value 0) if there is no interaction and have a non-zero value if there is an interaction: Gene Beta Epsilon Eta Gamma Sigma Theta Zeta AANAT 1

A distance matrix is then produced (outfile=FILE => FILE.ppi_dis.txt) consisting of the number of unique interactors for each pairwise comparison. (The format is set by outmatrix=X : text / mysql / phylip)

3. Incorporation of data from the GeneCards website (and Human EnsLoci) using rje_genecards. This will create a file called HPRD.genecards.tdt by default but this can be over-ridden using cardout=FILE. EnsLoci data will also be looked for in /home/richard/Databases/EnsEMBL/ens_HUMAN.loci.fas but this can be over-ridden with ensloci=FILE.


### HRPD Options ###
hprdpath=PATH : Path to HPRD Flat Files [./]
genecards=T/F : Make the HRPD.genecards.tdt file using rje_genecards (and its options) [False]
hprdfas=T/F : Whether to generate HPRD fasta files [False]
alliso=T/F : Whether to include all isoforms in the output [False]
ppitype=LIST : List of acceptable interaction types to parse out [in vitro;in vivo;yeast 2-hybrid]
badtype=LIST : List of bad interaction types, to exclude []
domainfas=T/F : Whether to output Domain fasta files [False]
complexfas=T/F : Whether to output Protein Complex fasta files [False]
outdir=PATH : The output directory for the files produced [./]
### Distance Matrix Options ###
ppitab=FILE : File containing PPI data (see 2 above)
scaled=T/F : Whether distance matrix is to be scaled by total number of interactors in pairwise comparison [F]

History Module Version History

    # 1.0 - Working version based on rje_ppi.
    # 1.1 - Added protein complexes and PPI Types.
    # 1.2 - Added tracking of evidence.
    # 1.2.1 - Fixed "PROTEIN_ARCHITECTURE" bug.

© 2015 RJ Edwards. Contact: