Module:	rje_biogrid
Description:	BioGRID Database processing module
Version:	1.6
Last Edit:	07/05/10

Imported modules: rje rje_seq rje_uniprot rje_zen

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This module is designed primarily for parsing the plain text ORGANISM downloads from the BioGRID database. These have names in the form: BIOGRID-ORGANISM-Saccharomyces_cerevisiae-2.0.27.tab.txt.

BioGRID tables contain useful information that can be used for cross-referencing to other sources, namely the protein names and gene symbols/aliases. The latter will be added to the dict['Mapping'] links dictionary of the BioGRID object, linking each symbol to the primary protein ID. These protein IDs will be used for storing the PPI data (in dict['PPI']) and extracting gene data from external sequence databases. These sequence databases need to be provided separately. This will be read in and added to the dict['Protein'] which will also store gene symbol data etc.

The selection of sequence files might turn out to be quite tricky, as different species have very different protein identifiers used. I will add a list of recommended sequence sources as I find them:

Yeast = EnsLoci treatment of the EnsEMBL yeast genome

BioGRID contains data for a number of experimental types. Those of interest can be specified with the ppitype=LIST option. Choices include: Affinity Capture-MS; Affinity Capture-Western; Biochemical Activity; Co-crystal Structure; Co-fractionation; Co-purification; Dosage Lethality; Dosage Rescue; Far Western; FRET; Phenotypic Enhancement; Phenotypic Suppression; Protein-peptide; Reconstituted Complex; Synthetic Growth Defect; Synthetic Lethality; Synthetic Rescue; Two-hybrid;

Reactome interactions are restricted to those of the "reaction" type. There are also "neighbouring_reaction" and "direct_complex" and "indirect_complex"

DIP interactions are restricted to those with two uniprotkb IDs. DIP has similar annotation to MINT, with MI nos.

Domino interactions are restricted to those with two uniprotkb IDs. Has similar annotation to MINT, with MI nos.

Commandline

BioGRID parsing and PPI Dataset Generation Options

ppifile=FILE : PPI database flat file [None]
seqin=FILE : Sequence file containing protein sequences with appropriate Accession Numbers/IDs [None]
genecards=FILE : File of links between IDs. For human, should have HGNC and EnsLoci columns. [None]
ppitype=LIST : List of acceptable interaction types to parse out []
badtype=LIST : List of bad interaction types, to exclude [indirect_complex,neighbouring_reaction]
symmetry=T/F : Enforce symmetry in interaction datasets [True]
dbsource=X : Source database (biogrid/dip/intact/mint/reactome) [biogrid]
mitab=T/F : Whether source file is in MITAB flat file format [True]
species=X : Name of species to use data for (will be read from file if BioGRID) [human]
taxid=LIST : List of NCBI Taxa IDs to use (for DIP and Domino) [9606]
unipath=PATH : Path to UniProt files [UniProt/]

Output Options

ppifas=T/F : Whether to output PPI datasets as fasta files into Species/BIOGRID_Datasets/ [True]
minseq=X : Minimum number of PPI sequences in order to output fasta file [3]
ppitab=T/F : Whether to output PPI table with aliases etc. [True]
alltypes=T/F : Output a full list of PPITypes. (Will populate the PPIType list) [False]

Special Options

hostvirus=T/F : Whether to pull out host-virus interactions only (MINT/IntAct only) [False]
vcodes=LIST : List/File of viral species codes for IntAct hostvirus=T []

History Module Version History

    # 0.0 - Initial Compilation with BioGRID Flat-File parsing information and sequence extraction for yeast.
    # 1.0 - Added cross-referencing via GeneCards output to generate Human Datasests.
    # 1.1 - Added IntAct and MINT parsing.
    # 1.2 - Add option to pull out host-virus interactions.
    # 1.3 - Added Reactome & DIP parsing.
    # 1.4 - Added rje_genemap object functionality.
    # 1.5 - Added Domino parsing and tracking of evidence codes.
    # 1.6 - Updated BioGRID parsing to use mitab format.

SLiMSuite REST Server