Function
PINGU (Protein Interaction Network & GO Utility) is designed to be a general utility for Protein Protein Interaction
(PPI) and Gene Ontology (GO) analysis. Earlier versions of PINGU contained a lot of the code for processing PPI and
GO data, which have subsequently been moved to rje_ppi.py
and rje_go.py
libraries.
PINGU 3.x was dominated by code to compile PPI data from multiple databases and map onto sequences from different
sources, in combination with rje_dbase.py
database downloads and processing. There was a substantial amount of
code for mapping data with different IDs, including peptide-based MS Ensembl identifications on to HGNC gene
identifiers. Some of this code is now handled by rje_genemap.py
whilst some of it has been depracated due to newer
(better) datasets and/or a shift in focus of the Edwards lab.
PINGU 4.x is designed to work in a more streamlined fashion with a more controlled subset of data, making
documentation and re-use a bit simpler and clearer. Many of the older functions are therefore run using
pingu_V3.py
, in which case a #PINGU
log statement will be generated. Some older functions will only be possible
by running pingu_V3.py
directly.
PINGU 4.0 is designed to work with HINT interaction data and UniProt sequences. The initial PPI download and
compilation is based on SLiMBench
. This has been updated in 4.9.0 following some changed to HINT downloads
(http://hint.yulab.org/download/) - there may be some additional unexpected/unwelcome consequences of these changes.
PINGU 4.1 updated the PPI compilation methods of PINGU 3.x, which can be triggered using ppicompile=T
. This will
need a database cross-reference file (xrefdata=LIST
).
PINGU 4.2 will download and use HGNC as a database xref file if xrefdata=HGNC
. Clearly, this will only work for
human data. Note that HINT is mapped to genes via Uniprot entries and does not use the xrefdata table.
PINGU 4.3 add domain-based domppi dataset generation (domppi=T
). This uses Pfam domain composition from Uniprot to
generate datasets of proteins that interact with hubs sharing a domain.
PINGU 4.4.x replaced ppicompile=T
with ppicompile=CDICT
. HPRD
, HINT
and Reactome
will be recognised and parsed
using custom methods: hprd=PATH
and reactome=FILE
must be set; HINT data will be read from sourcepath=PATH
/
.
Otherwise, entries will be treated as files (wildcard lists allowed) and either parsed as a pairwise PPI file (Hub
and Spoke
fields found) else a MITAB file (see rje_mitab for advanced field settings). The compiled PPI data
will be output to BASEFILE.pairwise.tdt
and used as ppisource=X
for additional processing/output.
Default mapping fields for XRef mapping are: Secondary,Ensembl,Aliases,Accessions,RefSeq,Previous Symbols,Synonyms
.
Secondary
will be added from Uniprot data if missing from the XRef table. unifield=X
will also be added to the
map fields if not included.
PINGU 4.6.x fixed/updated the PPI Fasta output methods (ppifas=T
). These will output to a directory named after the
ppisource
file and ppispec
. Each hub gene will produce a fasta file, gene.fasid.fas
where fasid
is set by the
ppisource
unless changed with fasid=X
. If combineppi=T
, a single spec.fasid.fas
file will be created. The
xhubppi=T
setting will generate a set of files containing spoke proteins that have x+ Hub interactors. Note that
*.1hub.fas
is essentially the same as the combineppi=T
spec.fasid.fas
file.
PINGU 4.7 added ppidbreport=T/F
output for PPI compilation, summarising the evidence codes and PPITypes read from
different sources.
Commandline
SOURCE DATA OPTIONS
sourcepath=PATH
/ : Will look in this directory for input files if not found ['SourceData/'
]
sourcedate=DATE
: Source file date (YYYY-MM-DD) to preferentially use [None
]
ppisource=X
: Source of PPI data. (HINT/FILE) FILE needs 'Hub', 'Spoke' and 'SpokeUni' fields. ['HINT'
]
ppispec=LIST
: List of PPI files/species/databases to generate PPI datasets from [HUMAN
]
download=T/F
: Whether to download files directly from websites where possible if missing [True
]
integrity=T/F
: Whether to quit by default if source data integrity is breached [True
]
xrefdata=LIST
: List of files with delimited data of identifier cross-referencing (see rje_xref) []
unifield=X
: Uniprot accession number field identifier for xrefdata ['Uniprot'
]
mapfields=LIST
: List of XRef fields to use for identifier mapping (plus unifield) [see docs
]
OUTPUT OPTIONS
resdir=PATH
: Redirect output files/directories to specified directory [./
]
basefile=X
: Results file prefix [pingu
]
acconly=T/F
: Whether to output lists of Accession numbers only, rather than full fasta files [False
]
fasid=X
: Text ID for fasta files (*.X.fas) [default named after ppisource(+'-dom')
]
PPI OUTPUT OPTIONS
ppiout=FILE
: Save pairwise PPI file following processing (if rest=None
) [None
]
ppifas=T/F
: Whether to output PPI fasta files [False
]
domppi=T/F
: Whether to generate Pfam Domain-based PPI files instead of protein-based PPI files [False
]
minppi=X
: Minimum number of PPI for file output [0
]
combineppi=T/F
: Whether to combine all spokes into a single fasta file [False
]
xhubppi=T/F
: Whether to generate PPI files of spokes interacting with X+ hubs [False
]
queryppi=FILE
: Load a file of 'Query','Hub' PPI and generate expanded PPI Datasets in PPI.*/ [None
]
queryseq=FILE
: Fasta file containing the Query protein sequences corresponding to QuerySeq [*.fas
]
allquery=T/F
: Whether to include all the new Queries from QueryPPI in all files for a given hub [True
]
ADVANCED/DEV OPTIONS
sourceurl=CDICT
: Dictionary of Source URL mapping (see code)
PPI COMPILATION/FILTERING OPTIONS
hublist=LIST
: List of hub genes to restrict pairwise PPI to []
hubonly=T/F
: Whether to restrict pairwise PPI to those with both hub and spoke in hublist [False
]
hubfield=X
: Hub field to use for hublist=LIST
[Hub
]
spokefield=X
: Spoke field to use for hublist=LIST
hubonly=T
[Spoke
]
ppicompile=CDICT
: List of db:file PPI Sources to compile and generate *.pairwise.tdt []
ppidbreport=T/F
: Summary output for PPI compilation of evidence/PPIType/DB overlaps [True
]
symmetry=T/F
: Whether to enforce Hub-Spoke symmetry during PPI compilation [True
]
hprd=PATH
: Path to HPRD flat files [None
]
taxid=LIST
: List of NCBI Taxa IDs to use [9606
]
badppi=LIST
: PPI Types to be removed. Will only remove PPI if no support remains []
goodppi=LIST
: Reduce PPI to those supported by listed types []
baddb=LIST
: PPI Types to be removed. Will only remove PPI if no support remains []
gooddb=LIST
: Reduce PPI to those supported by listed types []
OBSOLETE OPTIONS
biogrid=FILE
: BioGRID flat file [None
]
intact=FILE
: IntAct flat file [None
]
mint=FILE
: MINT flat file [None
]
reactome=FILE
: Reactome interactions flat file [None
]
dip=FILE
: DIP interactions flat file [None
]
domino=FILE
: Domino interactions flat file [None
]
evidence=FILE
: Mapping file for evidence terms [None] #!# Not currently implemented! #!#
See also rje.py generic commandline options.
History Module Version History
# 4.0 - Initial Compilation based on code from SLiMBench and PINGU 3.9 (inherited as pingu_V3).
# 4.1 - Adding compilation of PPI databases using new rje_xref V1.1 and older objects from PINGU V3.
# 4.2 - Bug fixes for use of PPISource to create PPI databases. Add HGNC to sourcedata (xrefdata=HGNC)
# 4.3 - Modified to use Pfam as hub field for DomPPI generation. Modified naming of PPI output after ppisource.
# 4.4.0 - Converted ppicompile=T to ppicompile=LIST.
# 4.5.0 - Added hublist=LIST : List of hub genes to restrict pairwise PPI to, and pairwise parsing.
# 4.5.1 - Debugging missing identifiers and indexing speed. Added good and bad DB.
# 4.5.2 - Fixed SIF output and changed names to sif-* for opening in browser.
# 4.5.3 - Updated REST output.
# 4.6.0 - Added hubonly=T/F : Whether to restrict pairwise PPI to those with both hub and spoke in hublist [False]
# 4.6.1 - Fixed some ppifas=T/F bugs and added combineppi=T/F : Whether to combine all spokes into a single fasta file [False]
# 4.6.2 - Added check/filter for multiple SpokeUni pointing to same sequence. (Compilation redundancy mapping failure!)
# 4.6.3 - Fixed issue with 1:many SpokeUni:Spoke mappings messing up XHub.
# 4.7.0 - Added ppidbreport=T/F : Summary output for PPI compilation of evidence/PPIType/DB overlaps [True]
# 4.8.0 - Fixed report duplication issue and added additional summary output.
# 4.9.0 - Updated HINT download and parsing details.
# 4.9.1 - Fixed Pairwise parsing and filtering for more flexibility of input. Fixed fasid=X bug and ppiseqfile names.
# 4.10.0 - Added hubfield and spokefield options for parsing hublist.
PINGU REST Output formats
Run with
&rest=help
for general options. Run with
&rest=full
to get full server output as text or
&rest=format
for more user-friendly formatted output. Individual outputs can be identified/parsed using
&rest=OUTFMT
:
pairwise
= main table of identified PPI for given
hublist=LIST
proteins. [
tdt
]
spokes
= non-redundant list of "spoke" genes that interact with hubs [list]
uniprot
= non-redundant list of uniprot accession numbers for proteins that interact with hubs [list]
sif-gene
= simple interaction file (SIF) format of gene identifiers for PPI [sif]
sif-uni
= simple interaction file (SIF) format of uniprot identifiers for PPI [sif]