Module:	rje_misc
Description:	Miscellenous script storage module
Version:	0.59.2
Last Edit:	15/04/19

Imported modules: rje rje_db rje_seq rje_seqlist rje_sequence rje_uniprot rje_xml rje_zen rje_blast_V1

See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This module is for "one-off" scripts for odd-jobs that often need to be run multiple times and then forgotten forever:

- aicpaper = Conversion of AIC PATIS tables for Jan 2014 AIC paper. - crisp = Analysis of Crisp et al paper data - testing = SWC boot camp testing - holger = reformat fasta - yunan = combine and normalised Yunan data - sageshape = reshape the SuperSAGE data for ANOVA and FDR analysis - 3did = generate 3DID benchmarking datasets - sfstick = Stick together SF files - dtfas = Reformat DT_Sequence data - laavanya = Prepares PTPRD PPI datasets for SLiMFinder - elm_presto_qsub = Makes a list of python command for qsub job - elm_presto_compile = Compiles all the PRESTO results from ELM conservation analyses into a single file - ppi_go_datafarm = Copies PPI and GO datasets into the current MotifAln folder - minimotif = Reformats mini_motif.txt into standard motif RegExps - wag = reconstruct a PAM1 matrix from the Goldman WAG matrix - hprd_elm_copy = Copies HPRD ELM datasets for SLiMDisc analysis - hprd_elm_comp = Extracts rows of *.compare.tdt files where the HPRD dataset matches its ELM - neduvamotif = Reformats raw motif file from Neduva & Russell work - disordertest = Tests the two disorder prediction servers and compares - ygob = Makes orthologue alignments from YGOB files - ygob2 = Makes orthologue alignments from YGOB files following the original ygob! - ensloci = Counts EnsEMBL loci - go_cc = Extracts a list of GO cellular component IDs into a file corresponding to SLiMDisc input - elm_gcut = Cuts down the ELM GABLAM results as they *should* have been output! - ensgo = Makes GO Datasets from EnsEMBL Loci data and BioMart download (& ensgo2) - elmdup = Identifies "duplicate" proteins from different species using orthaln.fas files - elmaln = Sorts ELM alignments - elmFT = Makes a Table of SwissProt Annotations from mapping.tdt and UniProt - termX = Reformats all fasta files in the directory to be UC for X aa at each end of the sequence and LC for rest - 143ppi = Make a table of 14-3-3 interacting proteins from HPRD - allslim = runs slimfinder on all datasets (*.dat & *.fas) in directory - slimtest = Makes a bunch of random datasets to run SLiMFinder on - slimtestsum = Makes a summary table of random datasets run with SLiMFinder - sfvst = SLiMBuild vs. TEIRESIAS time test w/o ambiguity - ensdat = EnsDat cleanup following rje_ensembl boo-boo - slimcons = Custom assembly of EnsDatHuman SLiMFinder runs and CompariMotif results for data analysis in R - locustdb = Temp analysis of Locust EST database - phomo = PhosphoMotif Reformatter - flyseq = Extract sequences from Flybase chado_xml - arath2go = Convert EMBL_ARATH sequence to GO using Arabidopsis download - intenrich = Reformat enriched integrin proteomics results for Pingu - sftargz = TarGZ cleanup of SLiMFinder results - taxadbsum = E hux Taxa DB summary cleanup - bencog = Makes Ben COG links - humsf09 = Human SF09 cleanup - biol3050 = BIOL3050 clone vs ENST cDNA matchup - jrjspf = Reformatting an SPF file for Joe Jenkins - disjson = Parsing of disorder JSON file - pop00freq - snakencbi = Reformatting the fasta and GFF files for snake genomes - starlingncbi = Reformatting the fasta and GFF files for starling genome

Commandline

job=X : Identifier for the job to be performed [None]
infile=FILE : Name of input file for relevant task [None]
batch=GLIST : List of files to process for relevant task []

History Module Version History

    # 0.1 - Initial Compilation with elm_presto_compile job.
    # 0.2 - Added ppi_go_datafarm job
    # 0.3 - minimotif = Reformats mini_motif.txt into standard motif RegExps
    # 0.4 - Added elm_presto_qsub
    # 0.5 - wag = reconstruct a PAM1 matrix from the Goldman WAG matrix
    # 0.6 - hprd_elm_copy = Copies HPRD ELM datasets for SLiMDisc analysis
    # 0.7 - hprd_elm_comp = Extracts rows of *.compare.tdt files where the HPRD dataset matches its ELM
    # 0.8 - neduvamotif = Reformats raw motif file from Neduva & Russell work
    # 0.9 - disordertest = Tests the two disorder prediction servers and compares
    # 0.10 - ygob = Makes orthologue alignments from YGOB files
    # 0.11 - ensloci = Counts EnsEMBL loci
    # 0.12 - go_cc = Extracts a list of GO cellular component IDs into a file corresponding to SLiMDisc input
    # 0.13 - elm_gcut = Cuts down the ELM GABLAM results as they *should* have been output!
    # 0.14 - ensgo = Makes GO Datasets from EnsEMBL Loci data and BioMart download
    # 0.15 - elmdup = Identifies "duplicate" proteins from different species using orthaln.fas files
    # 0.16 - elmaln = Sorts ELM alignments
    # 0.17 - elmFT = Makes a Table of SwissProt Annotations from mapping.tdt and UniProt
    # 0.18 - termX = Reformats all fasta files in the directory to be UC for X aa at each end of the sequence and LC for rest
    # 0.19 - allslim = runs slimfinder on all datasets (*.dat & *.fas) in directory
    # 0.20 - slimtest = Makes a bunch of random datasets to run SLiMFinder on
    # 0.21 - slimtestsum = Makes a summary table of random datasets run with SLiMFinder
    # 0.22 - sfvst = SLiMBuild vs. TEIRESIAS time test w/o ambiguity
    # 0.23 - mailer = E-Mail test
    # 0.24 - ensdat = EnsDat cleanup following rje_ensembl boo-boo
    # 0.25 - slimcons = Custom assembly of EnsDatHuman SLiMFinder runs and CompariMotif results for data analysis in R
    # 0.26 - locustdb = Temp analysis of Locust EST database
    # 0.27 - phomo = PhosphoMotif Finder reformatting
    # 0.28 - flyseq = Extract sequences from Flybase chado_xml
    # 0.29 - arath2go = Convert EMBL_ARATH sequence to GO using Arabidopsis download
    # 0.30 - intenrich = Reformat enriched integrin proteomics results for Pingu
    # 0.31 - laavanya = Prepares PTPRD PPI datasets for SLiMFinder
    # 0.32 - sftargz = TarGZ cleanup of SLiMFinder results
    # 0.33 - taxadbsum = E hux Taxa DB summary cleanup
    # 0.34 - bencog = Makes Ben COG links
    # 0.35 - sftidy = SLiMFinder grand results tidy up.
    # 0.36 - prpsi = Make SI table for PRP project
    # 0.37 - humsf09 = Human SF09 cleanup
    # 0.38 - biol3050 = BIOL3050 clone vs ENST cDNA matchup
    # 0.39 - biol2018 = BIOL2018 domain identification.
    # 0.40 - dtfas = Reformat DT_Sequence data
    # 0.41 - sfstick = Stick together SF files
    # 0.42 - 3did = generate 3DID benchmarking datasets
    # 0.43 - sageshape = reshape the SuperSAGE data for ANOVA and FDR analysis
    # 0.44 - yunan = combine and normalised Yunan data
    # 0.45 - kroc = Kieren ROC analysis
    # 0.46 - holger = reformat fasta
    # 0.47 - testing = SWC boot camp misc testing code
    # 0.48 - aicpaper = Conversion of AIC PATIS tables for Jan 2014 AIC paper.
    # 0.49.0 - jrjspf = Reformatting an SPF file for Joe Jenkins
    # 0.50.0 - crisp = Analysis of Crisp et al paper data
    # 0.51.0 - dcfmsi = Generation of HTML table code for Manefield DCMF data
    # 0.52.0 - mbgsnp = Compilation of MBG SNP Tables
    # 0.53.0 - mbgSNPFreq = Generation of SNP Frequency Changes from SNP Table.
    # 0.54.0 - disjson = Parsing of disorder JSON file
    # 0.55.0 - diphap = Parsing of pseudodiploid fasta file and annotating with diploid or haploid status
    # 0.56.0 - ancNorm = Conversion of raw anchor scores to normalised scores.
    # 0.57.0 - occjoin = Table join for occbench complementary analysis.
    # 0.58.0 - pop00freq = Regenerate SNP frequencies. Modified mbgSNPFreq to handle pop00freq output.
    # 0.59.0 - snakencbi = Reformatting the fasta and GFF files for snake genomes + starlingncbi
    # 0.59.1 - Updated DCMF SI
    # 0.59.2 - Updated DCMF SI -> dcmfSIJGINCBI()

SLiMSuite REST Server

rje_misc V0.59.2

Miscellenous script storage module

Function

Commandline

History Module Version History