Home Page
Archive > Posts > 2013 > July
Search:

Transferring an Excel Spreadsheet to MySQL [Again]
Data manipulation primer

Sigh, I just realized after writing this post that I had already covered this topic... oh well, this version has some new information the other one is missing.


I find people very often asking me to move data from an Excel spreadsheet to a MySQL database, so I thought I’d write up the procedure I follow when doing so. This assumes no multi-line cells or tabs in the excel spreadsheet data.

  1. You need a good text editor with regular expression support. I highly recommend EditPad Pro (a free version is available too), and will be assuming you are using it for the steps below.
  2. Make sure all data in the Excel spreadsheet is formatted for SQL insertion, for example:
    To convert a date “mm/dd/yyyy” to SQL:
    1. Copy the entire row to your text editor
    2. Run the following regular expression replace:
      Find TextReplace Text
      ^(\d+)/(\d+)/(\d+)$$3-$1-$2
    3. Copy the text back to the spreadsheet row
  3. Copy all the data into the text editor, and run the following regular expressions:
    Find TextReplace TextExplanation
    \\\\\\Escape backslash
    '\\'Escape single quotation mark
    \t','Change separators so that all values are encased as strings
    ^('Line prefix to insert a row and stringify the first value
    $'),Line suffix to insert a row and stringify the last value
  4. Change the very last character on the last line from a comma to a semi colon to end the query
  5. Add the following to the top of the file:
    SET NAMES 'utf8' COLLATE 'utf8_general_ci';
    SET CHARACTER SET 'utf8';
    TRUNCATE TABLE TABLE_NAME;
    INSERT INTO TABLE_NAME (Field1, Field2, ...) VALUES
    		
  6. Make sure the file is saved as UTF8: Menu -> Convert -> Text Encoding -> (Encode the data with another character set ...) AND (Unicode, UTF-8)
  7. Make sure the file is saved with Unix line breaks: Menu -> Convert -> To Unix (LF Only)
  8. Save the file and run the following in your MySQL command line prompt to import it:
    \u DATABASE_NAME
    \. FILE_NAME
    		

There are of course easier solutions, but they can often be buggy, and I figured this is a good primer on regular expressions and simple data manipulation :-)

My MySQL Replication Ring Configuration

As of 2010, if you wanted to set up a MySQL replication configuration with multiple servers which could all update and send the updates to the other servers, a replication ring was the only solution (in which every server has a master and a slave in a ring configuration). While there are new (probably better) solutions as of late including using MariaDB’s multi-source replication, and tungsten-replicator (which I was referred to in late April and have not yet tried), I still find a replication ring to be an easy to use solution in some circumstances. However, there are some major disadvantages including:

  • If one node in the ring goes down, the entire ring stops replicating at that point until the node is brought back up
  • If a node goes down in the ring and has corrupted or incomplete data, say, from a power outdate, the entire ring may have to be painstakingly synced and rebuilt.
Anywho, the following is my basic MySQL configurations for setting up a replication ring, which needs to be put on every server node in the ring: (See MySQL docs for more information on each configuration)
[mysqld]
#---GENERAL REPLICATION VARIABLES--- (These should never change)
log_bin=mysql-bin.log                           #Turn on the binary log, which is used to hold queries that are propagated to other machines
slave-skip-errors               = 1062          #Do not stop replication if a duplicate key is found (which shouldn’t happen anyways). You may want to turn this off to verify integrity, but then your replication ring will stop if a duplicate key is found
#master-connect-retry           = 5             #How often to keep trying to reconnect to the master node of the current machine after a connection is lost. This has been removed as of MySQL 5.5, and needs to instead be included in the “CHANGE MASTER” command
sync_binlog                     = 100           #After how many queries, sync to the binlog
slave_net_timeout               = 120           #The number of seconds to wait for more data from a master/slave before restarting the connection
max_binlog_size                 = 1000M         #The maximum size the binlog can get before creating a new binlog
log-slave-updates                               #Log slave updates to the binary log. If there are only 2 nodes in the ring, this is not required
slave_exec_mode                 = IDEMPOTENT    #Suppression of duplicate-key and no-key-found errors
replicate-same-server-id        = 0             #Skip running SQL commands received via replication that were generated by the current server node

#---INDEPENDENT REPLICATION VARIABLES--- (These should change per setup)
binlog-do-db                   = DATABASE_NAME  #Only add statements from this database to the binlog. You can have multiple of these configurations
replicate-do-db                = DATABASE_NAME  #Only read statements from this database during replication. You can have multiple of these configurations
auto_increment_increment        = 2             #After ever auto-increment, add this to its number. Set this to the number of nodes. This helps assures no duplicate IDs

#---SERVER CONFIGURATION--- (These numbers should match)
server-id                       = 1             #The current node number in the ring. Every node needs to have its own incremental server number starting at 1
auto_increment_offset           = 1             #What to start auto-increment numbers at for this server. Set this to the server-id. This helps assures no duplicate IDs
Cygwin SIGINT fix for golang

Cygwin has had a long time problem that, depending on your configuration, may cause you to be unable to send a SIGINT (interrupt signal via Ctrl+C) to a native Windows command line executables. As a matter of fact, trying to do so may completely freeze up the console, requiring a process kill of the actual console, bash, and the executable you ran. This problem can crop up for many reasons including the version of Cygwin you are running and your terminal emulator. I specifically installed mintty as my default Cygwin console to get rid of this problem a long time ago (among many other features it had), and now it even has this problem.

While my normal solution is to try and steer clear of native Windows command line executables in Cygwin, this is not always an option. Golang was also causing me this problem every time I ran a network server, which was especially problematic as I would have to ALSO manually kill the server process or it would continue to hold the network port so another test of the code could not use it. An example piece of code is as follows:

package main
import ( "net/http"; "fmt" )
func main() {
	var HR HandleRequest
	if err := http.ListenAndServe("127.0.0.1:81", HR); err!=nil {
		fmt.Println("Error starting server") }
}

//Handle a server request
type HandleRequest struct{}
func (HR HandleRequest) ServeHTTP(w http.ResponseWriter, req *http.Request) {
	fmt.Printf("Received connection from: %s\n", req.RemoteAddr)
}
---
go run example.go

The first solution I found to this problem, which is by far the best solution, was to build the executable and then run it, instead of just running it straight from go.
go build example.go && example.exe
However, as of this post, it seems to no longer work! The last time I tested it and confirmed it was working was about 3 months ago, so who knows what has changed since then.

The second solution is to just build in some method of killing the process that uses “os.Exit”. For example, the following will exit if the user types “exit”
func ListenForExitCommand() {
	for s:=""; s!="exit"; { //Listen for the user to type exit
		if _, err:=fmt.Scanln(&s); err!=nil {
			if err.Error()!="unexpected newline" {
				fmt.Println(err) }
		} else if s=="flush" || s=="exit" {
			//Clean up everything here
		}
	}
	fmt.Println("Exit received, closing process")
	os.Exit(1)
}
and then add the following at the top of the main function:
go ListenForExitCommand() //Listen for "exit" command
HTML5 Video in Chrome for Android
It never ends when programming for browsers

Due to the facts of life and economic demand, a majority of the work I’ve been doing over the past 10 years has been web work (PHP, JavaScript, SQL, HTML, CSS). When I first started in this field, browser incompatibilities were a MAJOR problem, but nowadays, it’s incredibly rare for me to write a piece of code in one browser that doesn’t automatically work in all the others without bugs. Maybe it’s just the fact I’ve been doing this for so long that I know what to avoid, but I’d like to think it’s mostly because the browser makers have gotten their acts together and subscribe to making everything compatible by following [W3C] guidelines.

Browser incompatibilities still creep up from time to time though, and I’ll be writing up about a few of them in some of my upcoming posts. The majority and biggest problems I seem to run into nowadays revolve around mobiles.


So in Google Chrome for Android, when trying to include an HTML5 video, the “autoplay” HTML5 attribute seems to be ignored. The obvious less-than-optimal hack would be to just include a “.play()” call in JavaScript, but this seems to only work if the action is triggered by the user (like when trying to create a popup window). So as far as I can see right now, there is no way to autoplay a video in Google Chrome for Android.

UTF8 fix for iOS mail
Thanks apple</sarcasm>

Nowadays, it just makes sense to do everything in utf8 from the start. Most everything of consequence now supports it, it keeps byte count down for most anyone using an alphabet that is primarily latin, it is compatible with a old of older software, and can help lead to some useful shortcuts in programming. A very good number of libraries and languages even assume it as the default character encoding now.

It’s especially nice for html and web pages, so content can just be pasted in as-is, except of course for single and double quotes (which I always replace with ’ “ ” anyways) and < > &. However, I was recently thrown for a loop when my utf8 encoding was not being read correctly in Apple’s iOS mail client. After lots of testing and research, I came to the conclusion that it just plain doesn’t support utf-8 by default! I think I read somewhere you have to install foreign language packs to make it available, but that doesn’t help our normal latin-based-alphabet users.


The simple solution for diacritic (and other html encodable) characters is to just encode them as html: (only supported for php>=5.3.4)
function EncodeForIOS($String) //This assumes the following characters are already encoded (line 2): & < >
{
	$MailHTMLConvertArray=get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES, 'UTF-8');
	unset($MailHTMLConvertArray['&'], $MailHTMLConvertArray['<'], $MailHTMLConvertArray['>']);
	return strtr($String, $MailHTMLConvertArray);
}
Windows “ln” (symbolic linking) support for Cygwin
Cygwin requires lots of tweaks to really be usable
The “ln -s” command in Cygwin creates a fake symbolic link only supported in Cygwin. I whipped up the following script to create a true windows symbolic link that is supported in both Windows and Cygwin (it shows up as a symlink in Cygwin).
TARGET=`echo $1 | perl -pe 's/\\//\\\\/g'`; #Determine the target from the first parameter (where we are linking to). Change forward slashes to back slashes
LINK=`echo $2 | perl -pe 's/\\//\\\\/g'` #Determine the link name from the second parameter (where the symlink is made). Change forward slashes to back slashes
cmd /c mklink $3 $4 $5 $6 "$LINK" "$TARGET" #Perform the windows mklink command with optional extra parameters
Note that for the link name, you have to include the full filename, not just specify a directory.
See here for information on mklink and its switches. Specifically:
  • /d : directory symlink (more hard)
  • /j : directory junction (more soft)
This handles the problem a little more directly than my other post on the topic ("Symlinks in a Windows programming environment").

[Edit on 2016-01-12 @ 12:34am]
And once again, I have a new version of the code. This version has the following advantages:
  • No longer limited to just 4 extra parameters (dynamic instead of static)
  • Can now just specify your directory as the link location, and the filename will be automatically filled in
  • Handles spaces better
Do note, if you want to link directly to a hard drive letter, you must use "c:/" instead of "/cygdrive/c/"

#Get the target and link
TARGET="$1"
shift
LINK="$1"
shift

#If the link is already a directory, append the filename to the end of it
if [ -d "$LINK" ]; then
    #Get the file/directory name without the rest of the path
    ELEMENT_NAME=`echo "$TARGET" | perl -pe 's/^.*?\/([^\/]+)\/?$/$1/'`

    #Append the file name to the target, making sure there is only 1 separating "/"
    LINK=`echo "$LINK" | perl -pe 's/^(.*?)\/?$/$1/'`
    LINK="$LINK"/"$ELEMENT_NAME"
fi

#Replace forward slashes with back slashes
TARGET=`echo $TARGET | perl -pe 's/\\//\\\\/g'`
LINK=`echo $LINK | perl -pe 's/\\//\\\\/g'`

#Perform the windows mklink command with optional extra parameters
cmd /c mklink "$@" "$LINK" "$TARGET"