Module:	PINGU
Description:	Protein Interaction Network & GO Utility
Version:	4.10.0
Last Edit:	21/05/19

Imported modules: rje rje_db rje_go rje_obj rje_ppi rje_seqlist rje_uniprot rje_xgmml rje_xref pingu_V3 rje_hprd rje_biogrid rje_mitab

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

PINGU (Protein Interaction Network & GO Utility) is designed to be a general utility for Protein Protein Interaction (PPI) and Gene Ontology (GO) analysis. Earlier versions of PINGU contained a lot of the code for processing PPI and GO data, which have subsequently been moved to rje_ppi.py and rje_go.py libraries.

PINGU 3.x was dominated by code to compile PPI data from multiple databases and map onto sequences from different sources, in combination with rje_dbase.py database downloads and processing. There was a substantial amount of code for mapping data with different IDs, including peptide-based MS Ensembl identifications on to HGNC gene identifiers. Some of this code is now handled by rje_genemap.py whilst some of it has been depracated due to newer (better) datasets and/or a shift in focus of the Edwards lab.

PINGU 4.x is designed to work in a more streamlined fashion with a more controlled subset of data, making documentation and re-use a bit simpler and clearer. Many of the older functions are therefore run using pingu_V3.py, in which case a #PINGU log statement will be generated. Some older functions will only be possible by running pingu_V3.py directly.

PINGU 4.0 is designed to work with HINT interaction data and UniProt sequences. The initial PPI download and compilation is based on SLiMBench. This has been updated in 4.9.0 following some changed to HINT downloads (http://hint.yulab.org/download/) - there may be some additional unexpected/unwelcome consequences of these changes.

PINGU 4.1 updated the PPI compilation methods of PINGU 3.x, which can be triggered using ppicompile=T. This will need a database cross-reference file (xrefdata=LIST).

PINGU 4.2 will download and use HGNC as a database xref file if xrefdata=HGNC. Clearly, this will only work for human data. Note that HINT is mapped to genes via Uniprot entries and does not use the xrefdata table.

PINGU 4.3 add domain-based domppi dataset generation (domppi=T). This uses Pfam domain composition from Uniprot to generate datasets of proteins that interact with hubs sharing a domain.

PINGU 4.4.x replaced ppicompile=T with ppicompile=CDICT. HPRD, HINT and Reactome will be recognised and parsed using custom methods: hprd=PATH and reactome=FILE must be set; HINT data will be read from sourcepath=PATH/. Otherwise, entries will be treated as files (wildcard lists allowed) and either parsed as a pairwise PPI file (Hub and Spoke fields found) else a MITAB file (see rje_mitab for advanced field settings). The compiled PPI data will be output to BASEFILE.pairwise.tdt and used as ppisource=X for additional processing/output.

Default mapping fields for XRef mapping are: Secondary,Ensembl,Aliases,Accessions,RefSeq,Previous Symbols,Synonyms. Secondary will be added from Uniprot data if missing from the XRef table. unifield=X will also be added to the map fields if not included.

PINGU 4.6.x fixed/updated the PPI Fasta output methods (ppifas=T). These will output to a directory named after the ppisource file and ppispec. Each hub gene will produce a fasta file, gene.fasid.fas where fasid is set by the ppisource unless changed with fasid=X. If combineppi=T, a single spec.fasid.fas file will be created. The xhubppi=T setting will generate a set of files containing spoke proteins that have x+ Hub interactors. Note that *.1hub.fas is essentially the same as the combineppi=T spec.fasid.fas file.

PINGU 4.7 added ppidbreport=T/F output for PPI compilation, summarising the evidence codes and PPITypes read from different sources.

Commandline

ADVANCED/DEV OPTIONS

sourceurl=CDICT : Will look in this directory for input files if not found ['SourceData/']
: Source file date (YYYY-MM-DD) to preferentially use [None]
: Source of PPI data. (HINT/FILE) FILE needs 'Hub', 'Spoke' and 'SpokeUni' fields. ['HINT']
: List of PPI files/species/databases to generate PPI datasets from [HUMAN]
: Whether to download files directly from websites where possible if missing [True]
: Whether to quit by default if source data integrity is breached [True]
: List of files with delimited data of identifier cross-referencing (see rje_xref) []
: Uniprot accession number field identifier for xrefdata ['Uniprot']
: List of XRef fields to use for identifier mapping (plus unifield) [see docs]
: Redirect output files/directories to specified directory [./]
: Results file prefix [pingu]
: Whether to output lists of Accession numbers only, rather than full fasta files [False]
: Text ID for fasta files (*.X.fas) [default named after ppisource(+'-dom')]
: Save pairwise PPI file following processing (if rest=None) [None]
: Whether to output PPI fasta files [False]
: Whether to generate Pfam Domain-based PPI files instead of protein-based PPI files [False]
: Minimum number of PPI for file output [0]
: Whether to combine all spokes into a single fasta file [False]
: Whether to generate PPI files of spokes interacting with X+ hubs [False]
: Load a file of 'Query','Hub' PPI and generate expanded PPI Datasets in PPI.*/ [None]
: Fasta file containing the Query protein sequences corresponding to QuerySeq [*.fas]
: Whether to include all the new Queries from QueryPPI in all files for a given hub [True]
: Dictionary of Source URL mapping (see code)

PPI COMPILATION/FILTERING OPTIONS

hublist=LIST : List of hub genes to restrict pairwise PPI to []
hubonly=T/F : Whether to restrict pairwise PPI to those with both hub and spoke in hublist [False]
hubfield=X : Hub field to use for hublist=LIST [Hub]
spokefield=X : Spoke field to use for hublist=LIST hubonly=T [Spoke]
ppicompile=CDICT : List of db:file PPI Sources to compile and generate *.pairwise.tdt []
ppidbreport=T/F : Summary output for PPI compilation of evidence/PPIType/DB overlaps [True]
symmetry=T/F : Whether to enforce Hub-Spoke symmetry during PPI compilation [True]
hprd=PATH : Path to HPRD flat files [None]
taxid=LIST : List of NCBI Taxa IDs to use [9606]
badppi=LIST : PPI Types to be removed. Will only remove PPI if no support remains []
goodppi=LIST : Reduce PPI to those supported by listed types []
baddb=LIST : PPI Types to be removed. Will only remove PPI if no support remains []
gooddb=LIST : Reduce PPI to those supported by listed types []

OBSOLETE OPTIONS

biogrid=FILE : BioGRID flat file [None]
intact=FILE : IntAct flat file [None]
mint=FILE : MINT flat file [None]
reactome=FILE : Reactome interactions flat file [None]
dip=FILE : DIP interactions flat file [None]
domino=FILE : Domino interactions flat file [None]
evidence=FILE : Mapping file for evidence terms [None] #!# Not currently implemented! #!#

See also rje.py generic commandline options.

History Module Version History

# 4.0 - Initial Compilation based on code from SLiMBench and PINGU 3.9 (inherited as pingu_V3).
# 4.1 - Adding compilation of PPI databases using new rje_xref V1.1 and older objects from PINGU V3.
# 4.2 - Bug fixes for use of PPISource to create PPI databases. Add HGNC to sourcedata (xrefdata=HGNC)
# 4.3 - Modified to use Pfam as hub field for DomPPI generation. Modified naming of PPI output after ppisource.
# 4.4.0 - Converted ppicompile=T to ppicompile=LIST.
# 4.5.0 - Added hublist=LIST : List of hub genes to restrict pairwise PPI to, and pairwise parsing.
# 4.5.1 - Debugging missing identifiers and indexing speed. Added good and bad DB.
# 4.5.2 - Fixed SIF output and changed names to sif-* for opening in browser.
# 4.5.3 - Updated REST output.
# 4.6.0 - Added hubonly=T/F : Whether to restrict pairwise PPI to those with both hub and spoke in hublist [False]
# 4.6.1 - Fixed some ppifas=T/F bugs and added combineppi=T/F : Whether to combine all spokes into a single fasta file [False]
# 4.6.2 - Added check/filter for multiple SpokeUni pointing to same sequence. (Compilation redundancy mapping failure!)
# 4.6.3 - Fixed issue with 1:many SpokeUni:Spoke mappings messing up XHub.
# 4.7.0 - Added ppidbreport=T/F : Summary output for PPI compilation of evidence/PPIType/DB overlaps [True]
# 4.8.0 - Fixed report duplication issue and added additional summary output.
# 4.9.0 - Updated HINT download and parsing details.
# 4.9.1 - Fixed Pairwise parsing and filtering for more flexibility of input. Fixed fasid=X bug and ppiseqfile names.
# 4.10.0 - Added hubfield and spokefield options for parsing hublist.

PINGU REST Output formats

Run with &rest=help for general options. Run with &rest=full to get full server output as text or &rest=format
for more user-friendly formatted output. Individual outputs can be identified/parsed using &rest=OUTFMT:

pairwise = main table of identified PPI for given hublist=LIST proteins. [tdt]
spokes = non-redundant list of "spoke" genes that interact with hubs [list]
uniprot = non-redundant list of uniprot accession numbers for proteins that interact with hubs [list]
sif-gene = simple interaction file (SIF) format of gene identifiers for PPI [sif]
sif-uni = simple interaction file (SIF) format of uniprot identifiers for PPI [sif]

SLiMSuite REST Server

PINGU V4.10.0

Protein Interaction Network & GO Utility