Function
This module contains the MotifList Class, which is designed to replace many of the functions that previously formed
part of the Presto Class. This class will then be used by PRESTO, SLiMPickings and CompariMotif (and others?) to
control Motif loading, redundancy and storage. MotifOcc objects will replace the previous PrestoSeqHit objects and
contain improved data commenting and retrieval methods. The MotifList class will contain methods for filtering motifs
according to individual or combined MotifOcc data.
The options below should be read in by the MotifList object when it is instanced with a cmd_list and therefore do not
need to be part of any class that makes use of this object unless it has conflicting settings.
The Motif Stats options are used by MotifList to calculate statistics for motif occurrences, though this data will
actually be stored in the MotifOcc objects themselves. This includes conservation statistics.
Note. Additional output parameters, such as motifaln and proteinaln settings, and stat filtering/novel scores are not
stored in this object, as they will be largely dependent on the main programs using the class, and the output from
those programs. (This also enables statfilters etc. to be used with stats not related to motifs and their occurrences
if desired.)
MotifList Commands
## Basic Motif Input/Formatting Parameters ##
motifs=FILE
: File of input motifs/peptides [None
]
Single line per motif format = 'Name Sequence #Comments' (Comments are optional and ignored)
Alternative formats include fasta, SLiMDisc output and raw motif lists.
minpep=X
: Min length of motif/peptide X aa [2
]
minfix=X
: Min number of fixed positions for a motif to contain [0
]
minic=X
: Min information content for a motif (1 fixed position = 1.0) [2.0
]
trimx=T/F
: Trims Xs from the ends of a motif [False
]
nrmotif=T/F
: Whether to remove redundancy in input motifs [False
]
minimotif=T/F
: Input file is in minimotif format and will be reformatted (PRESTO File format only) [False
]
goodmotif=LIST
: List of text to match in Motif names to keep (can have wildcards) []
ambcut=X
: Cut-off for max number of choices in ambiguous position to be shown as variant [10
]
reverse=T/F
: Reverse the motifs - good for generating a test comparison data set [False
]
msms=T/F
: Whether to include MSMS ambiguities when formatting motifs [False
]
## Motif Occurrence Statistics Options ##
winsa=X
: Number of aa to extend Surface Accessibility calculation either side of motif [0
]
winhyd=X
: Number of aa to extend Eisenberg Hydrophobicity calculation either side of motif [0
]
windis=X
: Extend disorder statistic X aa either side of motif (use flanks *only* if negative) [0
]
winchg=X
: Extend charge calculations (if any) to X aa either side of motif [0
]
winsize=X
: Sets all of the above window sizes (use flanks *only* if negative) [0
]
slimchg=T/F
: Calculate Asolute, Net and Balance charge statistics (above) for occurrences [False
]
iupred=T/F
: Run IUPred disorder prediction [False
]
foldindex=T/F
: Run FoldIndex disorder prediction [False
]
iucut=X
: Cut-off for IUPred results (0.0 will report mean IUPred score) [0.0
]
iumethod=X
: IUPred method to use (long/short) [short
]
domfilter=FILE
: Use the DomFilter options, reading domains from FILE [None] ?? Check how this works ??
ftout=T/F
: Make a file of UniProt features for extracted parent proteins, where possible, incoroprating SLIMs [*.features.tdt
]
percentile=X
: Percentile steps to return in addition to mean [0
]
## Conservation Parameters ## ??? Add separate SlimCons option ???
usealn=T/F
: Whether to search for and use alignemnts where present. [False
]
gopher=T/F
: Use GOPHER to generate missing orthologue alignments in alndir - see gopher.py options [False
]
alndir=PATH
: Path to alignments of proteins containing motifs [./] * Use forward slashes (/)
alnext=X
: File extension of alignment files, accnum.X [aln.fas
]
alngap=T/F
: Whether to count proteins in alignments that have 100% gaps over motif (True) or (False) ignore
as putative sequence fragments [False] (NB. All X regions are ignored as sequence errors.)
conspec=LIST
: List of species codes for conservation analysis. Can be name of file containing list. [None
]
conscore=X
: Type of conservation score used: [pos
]
- abs = absolute conservation of motif using RegExp over matched region
- pos = positional conservation: each position treated independently
- prop = conservation of amino acid properties
- all = all three methods for comparison purposes
consamb=T/F
: Whether to calculate conservation allowing for degeneracy of motif (True) or of fixed variant (False) [True
]
consinfo=T/F
: Weight positions by information content (does nothing for conscore=abs
) [True
]
consweight=X
: Weight given to global percentage identity for conservation, given more weight to closer sequences [0
]
- 0 gives equal weighting to all. Negative values will upweight distant sequences.
posmatrix=FILE
: Score matrix for amino acid combinations used in pos weighting. (conscore=pos
builds from propmatrix) [None
]
aaprop=FILE
: Amino Acid property matrix file. [aaprop.txt
]
## Alignment Settings ##
protalndir=PATH
: Output path for Protein Alignments [ProteinAln/
]
motalndir=PATH
: Output path for Motif Alignments []
flanksize=X
: Size of sequence flanks for motifs [30
]
xdivide=X
: Size of dividing Xs between motifs [10
]
## System Settings ##
iupath=PATH
: The full path to the IUPred exectuable [c:/bioware/iupred/iupred.exe
]
?? memsaver=T/F
: Whether to store all results in Objects (False) or clear as search proceeds (True) [True] ??
?- should this be controlled purely by the calling program? Probably!
fullforce=T/F
: Whether to force regeneration of alignments using GOPHER