SLiMSuite REST Server


Links
REST Home
EdwardsLab Homepage
EdwardsLab Blog
SLiMSuite Blog
SLiMSuite
Webservers
Genomes
REST Pages
REST Status
REST Help
REST Tools
REST Alias Data
REST API
REST News
REST Sitemap

rje_mysql V1.2

RJE Module for converting delimited text files to MySQL tables

Module: rje_mysql
Description: RJE Module for converting delimited text files to MySQL tables
Version: 1.2
Last Edit: 05/09/11

Copyright © 2005 Richard J. Edwards - See source code for GNU License Notice


Imported modules: rje


See SLiMSuite Blog for further documentation. See rje for general commands.

Function

This module is designed to take a bunch of delimited text files and create appropriate build statements for making MySQL tables, checking the content if desired. The main MySQL class has the global parameters for the reading and handling of input files, which are converted into Table objects. Each Table object itself has a number of Class objects. Unless memsaver=T, rje_mysql will check to see if contents for each field are unique.

The script processes files and generates builds statements in the following way:

1. The next file to be processed is displayed along with the default table name, its first line (split into fields), the delimiter used, and the number of fields. The list of headers (set by header=T/F) are displayed along with the second line of the file, also split into fields. The delimiter is a tab for *.tdt files, a comma for *.csv files, else is set by delimit=X.

2. There is then a user menu with the following options: < S >kip file = Skip this file and move on to the next one. < T >able Name/Descriptions = option to change the table name (default = file name without extension) and the description for the table that is printed in the build file's comment lines < H >eader Names/Descriptions = option to manually change the names and descriptions for field headers < A >utoname Headers = if the file does not meet the given header=T/F option, this will automatically rename all field headers using the first line if desired, or simply "field1" to "fieldn". Change < D >elimiter = change the delimiter selected for the file and return to step 1 above. < P >roceed = keep current settings and process file

3. The file is then read through to assign Types to each field: (a) Field contents are scanned to see if they contain pure numeric values and, if so, pure integer values. Otherwise, string contents are assumed. (b) The presence of Null (empty) values are recorded. Also, unless memsaver=T, the field contents is analysed to determine whether it is unique for each row. (c) The minimum and maximum lengths are recorded for each field. Minimum and maximum values are recorded for numeric values

4. Field types are assigned according to the field content characteristics: (a) Numeric fields are set as "Unsigned" if the minimum value is >= 0. This is used with the minimum and maximum values to assign the integer type if an integer or FLOAT if non-integer. (b) String fields are assigned a type based on the minimum and maximum lengths of the contents and whether these two values are different. This way a CHAR, VARCHAR or BLOB is assigned of the correct length. If i=1+, an option is given to change the read characteristics of each field.

5. Unique fields without null entries are identified as potential Primary Keys. If i=-1 then the first is used, else the user can choose. If there are no potential fields, are the user chooses none, then an auto-incrementing INT(10) field is added as a primary key.

6. Each field can be selected to be indexed. By default, any non-numeric fields which have a maximum length in the range set by indexlen=X,Y are indexed.

7. Based on all the information given, the build statement is generated. This consists of: - Comment lines first identify the name, description, file and number of lines. - DROP TABLE IF EXISTS command to clear existing data - CREATE TABLE using the Types, Primary Key and Indexed fields determined above - LOAD DATA statement, including the list of fields in the file if an auto-incremented key was added - DELETE FROM statement to remove header lines, allowing for data truncation upon loading Upon hitting <ENTER>, this will be written to the given output file (buildfile=FILE & combine=T/F)

Commandline

filelist=FILE(s): Input file or files. May be comma-separated (FILE1,FILE2) and include wildcards. [*.tdt,*.csv]
subfolders=T/F : Whether to look in subfolders [False]
mysql=T/F : Whether to assing data types and check data (else just report lengths of field contents) [True]
checktypes=T/F : Whether to check Data Types for given file [True]
buildfile=FILE : Output file for MySQL Build statements [mysql_build.txt]
combine=T/F : Whether to combine build statements in one file (True) or have separate file per table (False) [True]
append=T/F : Whether to append output files or generate new [True]
header=T/F : By default, the first line will be read as a header [True]
memsaver=T/F : Will not check for Unique fields if True. [False]
indexlen=X,Y : Will index all fields with non-numerics between X and Y letters by default [4,30]
sqldump=FILE : Read in an SQL dump file and convert into CSV [None]


© 2015 RJ Edwards. Contact: richard.edwards@unsw.edu.au.