This utility was originally created for handling proteomics data with EnsEMBL peptide IDs. The data needed to be
mapped onto Genes, overlaps and redundancies identified, gene lists output for GO analysis with FatiGO, and PPI data
from HPRD and BioGRID to identify potential complexes.
See rje_ensembl documentation for details of what to download for EnsGO files and how to make EnsLoci files etc. The
ens_SPECIES.GO.tdt file used for go mapping should be suitable for the
ppioutdir=PATH will produce combined PPI sequence files for all genes. A pingu.combinedppi.tdt summary file
will be placed in resdir.
QSLiMFinder=FILE will perform an analysis for shared motifs between primary interactors of those genes identified
in a given sample and the original sequence used for the pulldown. FILE should be a fasta file where names of the
sequences match Sample names. Datasets will then be formed that contain that sequence plus the primary PPI of each
gene in that sample as a dataset named SAMPLE_GENE.fas in a directory RESDIR/SLiMFinder.
### Main Input Options ###
data=LIST : List of files of results containing "Sample" and "Identifier" columns
ensmap=FILE : Mappings from EnsEMBL - peptides, genes and HGNC IDs (from BioMart)
ipilinks=FILE : IPI Links file with 'IPI', 'Symbol' and 'EnsG' fields 
ensloci=FILE : File of EnsEMBL genome EnsLoci treatment 
baits=LIST : List of genes of interest for overlap analysis 
addbaits=T/F : Whether to add primary interactors of baits as additional samples [
combaits=X : Whether to combine bait PPIs into single sample (X) (if
controls=LIST : List of sample names that correspond to controls [
experiments=LIST: List of sample names that correspond to key samples of interest 
exponly=T/F : Limited analysis to samples listed as experiments (before baits added etc.) [
addalias=FILE : Extra (manual?) aliases to add to GeneMap object following loading of pickles etc. [
### Processing Options ###
hgnconly=T/F : Whether to restrict PPI data to only those proteins with Gene Symbol links [
pickle=T/F : Whether to save/load pickle of parsed/combined data rather than regenerating each time [
pingupickle=FILE: Full path to Pingu pickle file to look for/use/save [
nocontrols=T/F : Whether to remove genes found in designated controls from designated experiments [
gablam=T/F : Whether to run all-by-all GABLAM on EnsLoci and add homology to networks [
ppitype=LIST : List of acceptable interaction types to parse out 
badtype=LIST : List of bad interaction types, to exclude [
makefam=X : GABLAM Percentage identity threshold for grouping sequences into families [
gofilter=LIST : List of GO IDs to filter out of gene lists 
goexcept=LIST : List of GO ID exceptions to filtering 
remsticky=X : Remove "sticky" hubs as defined by >X known PPI [
stickyhubs=T/F : Only remove "sticky" spokes but keep sticky hubs [
stickyppi=T/F : Only remove "sticky" hubs from samples, not from total PPI [
addlinks=T/F : Add linking proteins (linking two Sample proteins) [
### Main Output Options ###
resdir=PATH : Redirect output files to specified directory [
basefile=X : Results file prefix if no data file given with
fulloutput=T/F : Generate all possible outputs from one input [
genelists=T/F : Generate lists of genes for each sample (e.g. for FatiGO upload) [
gosummary=T/F : Make a GO summary table [
summaryhgnc=T/F : Generate a summary table of genes in dataset, including peptide lists for each sample [
mapout=T/F : Generate a summary table of full peptide mapping [
dbcomp=T/F : Comparison of PPI databases [
dbsizes=T/F : Outputs a file of PPI dataset sizes (histogram) [
allbyall=X : Generates an all-by-all table of PPI links upto X degrees of separation (sample only) [
pathfinder=X : Perform (lengthy) PathFinder analysis to link genes upto X degree separation (-1 = no limit) [
pathqry=LIST : Limit PathFinder analysis to start with given queries 
overlap=T/F : Produce a table of the overlap (mapped through HGNC) between samples (and bait 1y PPI) [
cytoscape=T/F : Produce old cytoscape input files from allbyall table (reads back in) [
xgmml=T/F : Produce an XGMML file with all Cytoscape data and more [
xgformat=T/F : Whether to add colour/shape formatting to XGMML output [
xgexpand=X : Expand XGMML network with additional levels of interactors [
xgcomplex=T/F : Restrict XGMML output (and expansion) to protein complex edges [
compresspp=T/F : Whether to compress multiple samples of interest into ShareX for cytoscape [
seqfiles=T/F : Whether to generate protein sequence fasta files using EnsLoci [
goseqdir=PATH : Path to output full GO fasta files (No output if blank/none) 
ppioutdir=PATH : Path to output combined PPI files (No output if blank/none) 
acconly=T/F : Whether to output lists of Accession numbers only, rather than full fasta files [
ensdat=PATH : Path to EnsDAT files to use for making combined PPI datasets [
qslimfinder=FILE: File containing sequences matching Sample names for Query SLiMFinder runs [
screenddi=FILE : Whether to screen out probably domain-domain interactions from file [
domppidir=PATH : Produce domain-based PPI files and output into PATH (No output if blank/none) 
nocomplex=T/F : Perform crude screening of complexes (PPI triplets w/o homodimers) [
fasid=X : Text ID for PPI fasta files [
association=T/F : Perform experiment association analysis [
asscombo=T/F : Whether to subdivide genes further based on combinations of experiments containing them [
noshare=T/F : Whether to exclude those genes that are shared between samples when comparing those samples [
selfonly=T/F : Whether to only look at associations within experiments, not between [
randseed=X : Seed for randomiser [
randnum=X : Number of randomisations [
### Database/Path options ###
enspath=PATH : Path to EnsEMBL downloads
ensgopath=PATH : Path to EnsGO files (!!! Restricted to Humans Currently !!!)
unipath=PATH : Path to UniProt files [
hprd=PATH : Path to HPRD flat files [
biogrid=FILE : BioGRID flat file [
intact=FILE : IntAct flat file [
mint=FILE : MINT flat file [
reactome=FILE : Reactome interactions flat file [
dip=FILE : DIP interactions flat file [
domino=FILE : Domino interactions flat file [
pairwise=FILE : Load interaction data from existing Pingu Pairwise file [
addppi=FILE : Add additional PPI from a simple delimited file IDA,IDB,Evidence [
genepickle=FILE : Pickled GeneMap object. Alternatively, use below commands to make GeneMap object [
- hgncdata/sourcedata/pickledata/aliases : See rje_genemap docstring.
pfamdata=FILE : Delimited files containing domain organisation of sequences [
evidence=FILE : Mapping file for evidence terms [
History Module Version History
# 0.1 - Initial Compilation. Basic GO mapping for EnsEMBL data.
# 0.2 - Mapping of EnsEMBL genes onto Gene Symbols and summary data table output.
# 0.3 - Reading and collation of PPI data.
# 0.4 - All-by-all PPI and sample overlap analyses.
# 0.5 - Cytoscape output from all-by-all analysis.
# 0.6 - Option to add interactors of baits as additional samples. Added resdir=PATH option. Added pickling.
# 0.7 - Added generation of EnsLoci datasets
# 0.8 - Combined PPI Dataset output
# 0.9 - Added DAT output option.
# 1.0 - Full working version with XGMML output including GABLAM relationships.
# 1.1 - Added Reactome and DIP.
# 1.2 - Added an output of shared PPIs for clustering. (*.cluster.tdt)
# 2.0 - Replace GeneCards with GeneMap. Improved compatibility with APHID and functioning of new options.
# 2.1 - Added rje_go.GO Object to store GO mappings etc.
# 2.2 - Altered the Pingu data input to be a list of files, not just one file.
# 2.3 - Added acclist and GO dataset outputs.
# 2.4 - Added tracking of source databases and evidence codes.
# 2.5 - Added PNG visualisations.
# 2.6 - Added loading interactions from pingu.pairwise.tdt.
# 2.7 - Expanded XGMML options and added pathfinder output.
# 3.0 - Major tidying of gene/peptide mapping. Added extra bait, experiment and protein family options.
# 3.1 - Removal of sticky spokes/hubs
# 3.2 - Added Domain-based PPI output.
# 3.3 - Added crude Complex filtering.
# 3.4 - Updated GO stuff.
# 3.5 - Added Experiment association output.
# 3.6 - Added addlinks=T/F option.
# 3.7 - Improved XGMML output.
# 3.8 - Hopefully fixed issue of Fasta file generation log output writing to wrong log file.
# 3.9 - Tidied imports.
PINGU REST Output formats
for general options. Run with
to get full server output as text or
for more user-friendly formatted output. Individual outputs can be identified/parsed using
= main table of identified PPI for given
= non-redundant list of "spoke" genes that interact with hubs [list]
= non-redundant list of uniprot accession numbers for proteins that interact with hubs [list]
= simple interaction file (SIF) format of gene identifiers for PPI [sif]
= simple interaction file (SIF) format of uniprot identifiers for PPI [sif]