PINGU (Protein Interaction Network & GO Utility) is designed to be a general utility for Protein Protein Interaction
(PPI) and Gene Ontology (GO) analysis. Earlier versions of PINGU contained a lot of the code for processing PPI and
GO data, which have subsequently been moved to
PINGU 3.x was dominated by code to compile PPI data from multiple databases and map onto sequences from different
sources, in combination with
rje_dbase.py database downloads and processing. There was a substantial amount of
code for mapping data with different IDs, including peptide-based MS Ensembl identifications on to HGNC gene
identifiers. Some of this code is now handled by
rje_genemap.py whilst some of it has been depracated due to newer
(better) datasets and/or a shift in focus of the Edwards lab.
PINGU 4.x is designed to work in a more streamlined fashion with a more controlled subset of data, making
documentation and re-use a bit simpler and clearer. Many of the older functions are therefore run using
pingu_V3.py, in which case a
#PINGU log statement will be generated. Some older functions will only be possible
PINGU 4.0 is designed to work with HINT interaction data and UniProt sequences. The initial PPI download and
compilation is based on
PINGU 4.1 updated the PPI compilation methods of PINGU 3.x, which can be triggered using
ppicompile=T. This will
need a database cross-reference file (
PINGU 4.2 will download and use HGNC as a database xref file if
xrefdata=HGNC. Clearly, this will only work for
human data. Note that HINT is mapped to genes via Uniprot entries and does not use the xrefdata table.
PINGU 4.3 add domain-based domppi dataset generation (
domppi=T). This uses Pfam domain composition from Uniprot to
generate datasets of proteins that interact with hubs sharing a domain.
PINGU 4.4.x replaced
Reactome will be recognised and parsed
using custom methods:
reactome=FILE must be set; HINT data will be read from
Otherwise, entries will be treated as files (wildcard lists allowed) and either parsed as a pairwise PPI file (
Spoke fields found) else a MITAB file (see rje_mitab for advanced field settings). The compiled PPI data
will be output to
BASEFILE.pairwise.tdt and used as
ppisource=X for additional processing/output.
Default mapping fields for XRef mapping are:
Secondary will be added from Uniprot data if missing from the XRef table.
unifield=X will also be added to the
map fields if not included.
PINGU 4.6.x fixed/updated the PPI Fasta output methods (
ppifas=T). These will output to a directory named after the
ppisource file and
ppispec. Each hub gene will produce a fasta file,
fasid is set by the
ppisource unless changed with
combineppi=T, a single
spec.fasid.fas file will be created. The
xhubppi=T setting will generate a set of files containing spoke proteins that have x+ Hub interactors. Note that
*.1hub.fas is essentially the same as the
PINGU 4.7 added
ppidbreport=T/F output for PPI compilation, summarising the evidence codes and PPITypes read from
SOURCE DATA OPTIONS
sourcepath=PATH/ : Will look in this directory for input files if not found [
sourcedate=DATE : Source file date (YYYY-MM-DD) to preferentially use [
ppisource=X : Source of PPI data. (HINT/FILE) FILE needs 'Hub', 'Spoke' and 'SpokeUni' fields. [
ppispec=LIST : List of PPI files/species/databases to generate PPI datasets from [
download=T/F : Whether to download files directly from websites where possible if missing [
integrity=T/F : Whether to quit by default if source data integrity is breached [
xrefdata=LIST : List of files with delimited data of identifier cross-referencing (see rje_xref) 
unifield=X : Uniprot accession number field identifier for xrefdata [
mapfields=LIST : List of XRef fields to use for identifier mapping (plus unifield) [
resdir=PATH : Redirect output files/directories to specified directory [
basefile=X : Results file prefix [
acconly=T/F : Whether to output lists of Accession numbers only, rather than full fasta files [
fasid=X : Text ID for fasta files (*.X.fas) [
default named after ppisource(+'-dom')]
PPI OUTPUT OPTIONS
ppiout=FILE : Save pairwise PPI file following processing (if
ppifas=T/F : Whether to output PPI fasta files [
domppi=T/F : Whether to generate Pfam Domain-based PPI files instead of protein-based PPI files [
minppi=X : Minimum number of PPI for file output [
combineppi=T/F : Whether to combine all spokes into a single fasta file [
xhubppi=T/F : Whether to generate PPI files of spokes interacting with X+ hubs [
queryppi=FILE : Load a file of 'Query','Hub' PPI and generate expanded PPI Datasets in PPI.*/ [
queryseq=FILE : Fasta file containing the Query protein sequences corresponding to QuerySeq [
allquery=T/F : Whether to include all the new Queries from QueryPPI in all files for a given hub [
sourceurl=CDICT : Dictionary of Source URL mapping (see code)
PPI COMPILATION/FILTERING OPTIONS
hublist=LIST : List of hub genes to restrict pairwise PPI to 
hubonly=T/F : Whether to restrict pairwise PPI to those with both hub and spoke in hublist [
ppicompile=CDICT : List of db:file PPI Sources to compile and generate *.pairwise.tdt 
ppidbreport=T/F : Summary output for PPI compilation of evidence/PPIType/DB overlaps [
symmetry=T/F : Whether to enforce Hub-Spoke symmetry during PPI compilation [
hprd=PATH : Path to HPRD flat files [
taxid=LIST : List of NCBI Taxa IDs to use [
badppi=LIST : PPI Types to be removed. Will only remove PPI if no support remains 
goodppi=LIST : Reduce PPI to those supported by listed types 
baddb=LIST : PPI Types to be removed. Will only remove PPI if no support remains 
gooddb=LIST : Reduce PPI to those supported by listed types 
biogrid=FILE : BioGRID flat file [
intact=FILE : IntAct flat file [
mint=FILE : MINT flat file [
reactome=FILE : Reactome interactions flat file [
dip=FILE : DIP interactions flat file [
domino=FILE : Domino interactions flat file [
evidence=FILE : Mapping file for evidence terms [None] #!# Not currently implemented! #!#
See also rje.py generic commandline options.
History Module Version History
# 4.0 - Initial Compilation based on code from SLiMBench and PINGU 3.9 (inherited as pingu_V3).
# 4.1 - Adding compilation of PPI databases using new rje_xref V1.1 and older objects from PINGU V3.
# 4.2 - Bug fixes for use of PPISource to create PPI databases. Add HGNC to sourcedata (xrefdata=HGNC)
# 4.3 - Modified to use Pfam as hub field for DomPPI generation. Modified naming of PPI output after ppisource.
# 4.4.0 - Converted ppicompile=T to ppicompile=LIST.
# 4.5.0 - Added hublist=LIST : List of hub genes to restrict pairwise PPI to, and pairwise parsing.
# 4.5.1 - Debugging missing identifiers and indexing speed. Added good and bad DB.
# 4.5.2 - Fixed SIF output and changed names to sif-* for opening in browser.
# 4.5.3 - Updated REST output.
# 4.6.0 - Added hubonly=T/F : Whether to restrict pairwise PPI to those with both hub and spoke in hublist [False]
# 4.6.1 - Fixed some ppifas=T/F bugs and added combineppi=T/F : Whether to combine all spokes into a single fasta file [False]
# 4.6.2 - Added check/filter for multiple SpokeUni pointing to same sequence. (Compilation redundancy mapping failure!)
# 4.6.3 - Fixed issue with 1:many SpokeUni:Spoke mappings messing up XHub.
# 4.7.0 - Added ppidbreport=T/F : Summary output for PPI compilation of evidence/PPIType/DB overlaps [True]
# 4.8.0 - Fixed report duplication issue and added additional summary output
PINGU REST Output formats
for general options. Run with
to get full server output as text or
for more user-friendly formatted output. Individual outputs can be identified/parsed using
= main table of identified PPI for given
= non-redundant list of "spoke" genes that interact with hubs [list]
= non-redundant list of uniprot accession numbers for proteins that interact with hubs [list]
= simple interaction file (SIF) format of gene identifiers for PPI [sif]
= simple interaction file (SIF) format of uniprot identifiers for PPI [sif]