RABiD BUNNY FEVER
There are 3 different methods for updating multiple rows at once in MySQL with different values:
INSERT: INSERT with ON DUPLICATE KEY UPDATE
INSERT INTO FooBar (ID, foo)
VALUES (1, 5), (2, 8), (3, 2)
ON DUPLICATE KEY UPDATE foo=VALUES(foo);
TRANSACTION: Where you do an update for each record within a transaction (InnoDB or other DBs with transactions)
UPDATE FooBar SET foo=5 WHERE ID=1;
UPDATE FooBar SET foo=8 WHERE ID=2;
UPDATE FooBar SET foo=2 WHERE ID=3;
CASE: In which you a case/when for each different record within an UPDATE
UPDATE FooBar SET foo=CASE ID
WHEN 1 THEN 5
WHEN 2 THEN 8
WHEN 3 THEN 2
WHERE ID IN (1,2,3);
I feel knowing the speeds of the 3 different methods is important.
All of the following numbers apply to InnoDB.
I just tested this, and the INSERT method was 6.7x faster for me than the TRANSACTION method. I tried on a set of both 3,000 and 30,000 rows and got the same results.
The TRANSACTION method still has to run each individually query, which takes time, though it batches the results in memory, or something, while executing. The TRANSACTION method is also pretty expensive in both replication and query logs.
Even worse, the CASE method was 41.1x slower than the INSERT method w/ 30,000 records (6.1x slower than TRANSACTION). And 75x slower in MyISAM. INSERT and CASE methods broke even at ~1,000 records. Even at 100 records, the CASE method is BARELY faster.
So in general, I feel the INSERT method is both best and easiest to use. The queries are smaller and easier to read and only take up 1 query of action. This applies to both InnoDB and MyISAM.
Using the INSERT method, there can be a problem in which NON-NULL fields with no default (in other words, required fields) are not being updated. You will get an error like “Field 'fieldname' doesn't have a default value”. The solution is to temporarily turn off STRICT_TRANS_TABLES and STRICT_ALL_TABLES in the SQL mode: SET SESSION sql_mode=REPLACE(REPLACE(@@SESSION.sql_mode,"STRICT_TRANS_TABLES",""),"STRICT_ALL_TABLES",""). Make sure to save the sql_mode first if you plan on reverting it.
As for other comments I’ve seen that say the auto_increment goes up using the INSERT method, I tested that too and it seems to not be the case.
Code to run the tests is as follows:
(It also outputs .SQL files to remove PHP interpreter overhead)
//These 2 functions need to be filled in
//Run the 3 tests
function RunTest($TestNum, $NumRows)
$DoQuery=function($Query) use (&$TheQueries)
$DoQuery('DROP TABLE IF EXISTS '.$TableName);
$DoQuery('CREATE TABLE '.$TableName.' (i1 int NOT NULL AUTO_INCREMENT, i2 int NOT NULL, primary key (i1)) ENGINE=InnoDB');
$DoQuery('INSERT INTO '.$TableName.' (i2) VALUES ('.implode('), (', range(2, $NumRows+1)).')');
$DoQuery('UPDATE '.$TableName.' SET i2='.(($i+5)*1000).' WHERE i1='.$i);
$Query=sprintf("(%d,%d)", $i, (($i+5)*1000));
$DoQuery('INSERT INTO '.$TableName.' VALUES '.implode(', ', $Query).' ON DUPLICATE KEY UPDATE i2=VALUES(i2)');
$Query=sprintf('WHEN %d THEN %d', $i, (($i+5)*1000));
$DoQuery("UPDATE $TableName SET i2=CASE i1\n".implode("\n", $Query)."\nEND\nWHERE i1 IN (".implode(',', range(1, $NumRows)).')');
print "$TestName: ".(microtime(true)-$Start)."<br>\n";
file_put_contents("./$TestName.sql", implode(";\n", $TheQueries).';');
I wrote the code at the end of this post to test the different forms of string concatenation and they really are all almost exactly equal in both memory and time footprints.
The two primary methods I used are concatenating strings onto each other, and filling an array with strings and then imploding them. I did 500 string additions with a 1MB string in PHP 5.6 (so the result is a 500MB string). At every iteration of the test, all memory and time footprints were very very close (at ~$IterationNumber*1MB). The runtime of both tests was 50.398 seconds and 50.843 seconds consecutively which are most likely within acceptable margins of error.
Garbage collection of strings that are no longer referenced seems to be pretty immediate, even without ever leaving the scope. Since the strings are mutable, no extra memory is really required after the fact.
HOWEVER, The following tests showed that there is a different in peak memory usage WHILE the strings are being concatenated.
Result=10,806,800 bytes (~10MB w/o the initial PHP memory footprint)
$Final=implode('', Array($OneMB, $OneMB, $OneMB, $OneMB, $OneMB));
Result=6,613,320 bytes (~6MB w/o the initial PHP memory footprint)
So there is in fact a difference that could be significant in very very large string concatenations memory-wise (I have run into such examples when creating very large data sets or SQL queries).
But even this fact is disputable depending upon the data. For example, concatenating 1 character onto a string to get 50 million bytes (so 50 million iterations) took a maximum amount of 50,322,512 bytes (~48MB) in 5.97 seconds. While doing the array method ended up using 7,337,107,176 bytes (~6.8GB) to create the array in 12.1 seconds, and then took an extra 4.32 seconds to combine the strings from the array.
Anywho... the below is the benchmark code I mentioned at the beginning which shows the methods are pretty much equal. It outputs a pretty HTML table.
//Please note, for the recursion test to go beyond 256, xdebug.max_nesting_level needs to be raised.
//You also may need to update your memory_limit depending on the number of iterations
//Output the start memory
print 'Start: '.memory_get_usage()."B<br><br>Below test results are in MB<br>";
//Our 1MB string
global $OneMB, $NumIterations;
//Run the tests
//Output the results in a table
Array('ConcatTest', 'ImplodeTest', 'RecurseTest'),
Array($ConcatTest, $ImplodeTest, $RecurseTest)
//Start a test run by initializing the array that will hold the results and manipulating those results after the test is complete
RunTestReal($TestName, $CurrentTestNums, $StrLen);
//Subtract $TestStartMem from all other numbers
foreach($CurrentTestNums as &$Num)
//Initialize the test and store the memory allocated at the end of the test, with the result
function RunTestReal($TestName, &$CurrentTestNums, &$StrLen)
//Concatenate 1MB string over and over onto a single string
global $OneMB, $NumIterations;
//Create an array of 1MB strings and then join w/ an implode
global $OneMB, $NumIterations;
return implode('', $Result);
//Recursively add strings onto each other
function RecurseTest(&$CurrentTestNums, $TestNum=0)
Global $OneMB, $NumIterations;
//Output the results in a table
function OutputResults($TestNames, $TestResults)
print '<table border=1 cellspacing=0 cellpadding=2><tr><th>Test Name</th><th>'.implode('</th><th>', $TestNames).'</th></tr>';
$FinalNames=Array('Final Result', 'Clean');
$TestName=($i<$NumIterations ? $i : $FinalNames[$i-$NumIterations]);
foreach($TestResults as $TR)
//Other result numbers
print '<tr><th>Final String Size</th>';
foreach($TestResults as $TR)
foreach($TestResults as $TR)
So I got a new computer back in April and have finally gotten around to doing some speed tests to see how different applications and settings affect performance/harddrive read speed.
The following is the (relevant) computer hardware configuration:
- Motherboard: MSI Z87-GD65
- CPU: Intel Core i7-4770K Haswell 3.5GHz
- GPU: GIGABYTE GV-N770OC-4GD GeForce GTX 770 4GB
- RAM: Crucial Ballistix Tactical 2*8GB
- 2*Solid state hard drives (SDD): Crucial M500 480GB SATA 2.5" 7mm
- 7200RPM hard drive (HDD): Seagate Barracuda 3TB ST3000DM001
- Power Supply: RAIDMAX HYBRID 2 RX-730SS 730W
- CPU Water Cooler: CORSAIR H100i
- Case Fans: 2*Cooler Master MegaFlow 200, 200mm case fan
I started with a completely clean install of Windows 7 Ultimate N x64 to gather these numbers.
The first column is the boot time, from the time the start of the "Starting Windows" animation shows to when the user login screen shows up, so the BIOS is not included. I used a stopwatch to get these boot numbers (in seconds), so they are not particularly accurate.
The second and third columns are the time (in seconds) to run a "time md5sum" on cygwin64 on a 1.39GB file (1,503,196,839 bytes), on the solid state (SDD) and 7200RPM (HDD) drives respectively. They are taken immediately after boot so caching and other applications using resources are not variables. I generally did not worry about running the tests multiple times and taking lowest case numbers. The shown milliseconds fluctuations are within margin of error for software measurements due to context switches.
Boot times are affected between multiple steps, as seen below, but not too bad. The only thing that affected the MD5sum was adding the hardware mirror raid on the SSDs, which dropped the time of the md5 by half. So overall, antivirus and system encryption did not have any noticeable affect on the computer's performance (at least regarding IO on a single file and number crunching).
|What was added ||Boot ||SSD ||HDD ||Notes|
|Initial installation ||4 ||- ||- |
|NIC Drivers and Cygwin ||7 ||4.664 ||8.393 ||I'm not sure why the boot time jump so much at this point. The initial number might have been a fluke.|
|All Windows updates + drivers + 6 monitors ||14 ||4.618 ||8.393 ||The boot time jumped up a lot due to having to load all the monitors|
|Raid 1 mirror[Windows] on SSDs + no page file ||17 ||4.618 ||8.393 ||This was removed once I realized Truecrypt could not be used on a dynamic disk (Windows software) RAID|
|Raid 1 mirror[hardware] on SSDs + no page file ||17 ||2.246 ||8.408 |
|Truecrypt System Volume Encryption (SSD Raid Only) ||17-18 ||2.278 ||8.424 |
|Antivirus ||18 ||2.324 ||8.408 ||Kaspersky 2014|
I’ve wanted to create and test a large-scale application for a very long time but have never really had the chance until recently. The Vintage Experience project I did earlier this year finally gave me the opportunity. As one of many parts of the project, I was tasked to create a voting system that could handle 1 million votes via a web page in a 30 second time span. The final system was deployed successfully without any problems for Gala Artis 2013 (a French Canadian artist/TV awards show). The following are the results of my implementation and testing.
The main front-end was done via a static HTML page (smart-phone optimized) that was hosted by Amazon S3, where handling 33k requests/second is a drop in the bucket. All voting requests were done via AJAX from this web page to backend servers hosted by Amazon EC2.
The backend was programmed in GoLang as a simple web server, optimized for memory and speed, which spawned a new goroutine for each incoming request. The request returned a message to the user stating either the error message, or success if the vote was added to the database. Each server held a single persistent MySQL connection to an Amazon RDS “Large DB Instance” with the minimum IOPS (1000). Votes from a server were sent to the database in batches once a second, or earlier if 10,000 votes had been received.
The servers were Amazon “M1 Standard Extra Large” (m1.xlarge) instances running Linux, of which there were 6 total, handling vote requests delegated by a round-robin DNS on Amazon’s Route 53. During stress testing, each server was found to be able to handle 6800 requests/second, and load was staying under 3, so there were was probably another bottle neck. Running the same tests using php(sapi)+apache(fork), only 4500 requests/second could be executed, and there was a 16+ load.
On the servers, I found it necessary to set the following sysctl setting to increase performance “net.core.somaxconn=1024”. The following commands need to be run to execute this:
sysctl 'net.core.somaxconn=1024' #Store for the current computer session
echo 'net.core.somaxconn=1024' >> /etc/sysctl.conf #Set after a reboot
Stress test client instances were also run on Amazon as m1.xlarge instances, and were found to be able to push 5000-6000 requests/second. The GoLang test clients spawned 200 requests at a time and waited for them to finish before sending the next batch. The client system needed the following sysctl settings for optimal performance:
Due to another new project I’ve been working on, I was able to test out different methods on rendering sprites on the Android platform.
- Method #1: Use OpenGL ES to render to a buffer in C via JNI. This method turned out to be the fastest if polygons are batched via lists in a single (or a few) draw calls, and if using fixed instead of floating point numbers. It also opens up a lot of other effects natively not easily implemented through the other two methods (native 3D and matrices). However, it does not support antialiasing natively as is advertised and as I expected. The OpenGL ES 1.0 specs report “antialiased polygon rendering (with alpha border fragments, not multisample)”, but I could not find any information on this at all, and no antialiasing calls that I tried actually worked. There are methods that would allow antialiasing, but none efficient enough for a phone.
- Method #2: Drawing directly to a pixel buffer in C via JNI. This method came in a very close second place if using memory copying functions to blit images, but that also removes a lot of important effects like translucency. Results were less than optimal when not using direct memory copies (I didn’t even worry about including them below) :-\.
- Method #3: Rendering through the Android API in Java. While this method is the most robust in terms of feature set, it is also by far the slowest.
For my personal goals, I’ve come to the conclusion it would be best to use Method #1 for my heavy rendering tasks, and used baked translucent pixels around the borders in the images to emulate 2D antialiasing. The translucent borders could also be added at load time. I think I may also use Method #3 for some other special effects and overlays, though that will also make it harder to port, but I think worth it for now.
The following are the approximate average FPS
results to the speed tests I performed (preceded by column descriptions):
- The Test: 45*45 quads (2*45*45=4050 polygons) drawn to cover the whole screen each frame, none overlapping
- Emulator/Droid: The tests were run on 2 separate systems. An Android emulator running v2.2 on my Intel core i5 2.53ghz laptop, and my Motorola Droid also running Android v2.2.
- FLOAT/FIXED: This is only used on Method #1. OpenGL ES supports (in different functions) both floating point numbers and fixed point numbers. Since phones usually have no FPU, and the ones that do might have very slow ones, it’s much more safe efficiency wise to not use floating point numbers. The OpenGL ES solution was to have non-floating (fixed) point decimal arithmetic numbers that act virtually the exact same as normal integers when used with arithmetic functions, but do not have near the range as either floating point numbers or normal integers. The gain by going from floating to fixed was marginal in my tests. 16 bit integers (shorts), which I prefer, can also be used and showed near identical results to fixed point numbers.
- INDIVIDUAL/ALL: This is only used on Method #1. It means whether all the quads were drawn individually in a different call for each (which draws 2 triangles to make a quad), or in 1 call from a list of all the triangles. The gain by going from INDIVIDUAL to ALL was ~1fps on the emulator and ~12fps on the Droid in my tests.
| ||Emulator ||Droid|
|METHOD1+FLOAT+ALL ||11.2 ||43.4|
|METHOD1+FIXED+ALL ||11.5 ||44.2|
|METHOD1+FLOAT+INDIVIDUAL ||9.7 ||30.4|
|METHOD1+FIXED+INDIVIDUAL ||10.7 ||32.5|
|METHOD2 ||25.5 ||43.3|
|METHOD3 ||6.4 ||22.8|
I’m gonna cheat today since it is really late, as I spent a good amount of time organizing the 3D Engines update which pushed me a bit behind, and I’m also exhausted. Instead of writing some more content, I’m just linking to the “Utilized Optimization Techniques” section of the 3D Engines project, which I put up today.
It describes 4 programming speed optimization tricks: Local variable assignment, precalculating index lookups, pointer transversing/addition, and loop unrolling. This project post also goes into some differences between the used languages [Flash, C++, and Java], especially when dealing with speed.
TrueCrypt 6.0 [latest version]
came out today, and I was looking at the version history
. I mention this because I wrote a post about TrueCrypt 5.0
(3 days after it was released, on February the 5th of this year) and the problems I was having with it. I was not aware that after I submitted the bugs to them, they fixed the 2 important ones I reported (See 5.0a history)
4 days after I wrote the post, which were:
- On computers equipped with certain brands of audio cards, when performing the system encryption pretest or when the system partition/drive is encrypted, the sound card drivers failed to load. This will no longer occur. (Windows Vista/XP/2003)
- It is possible to access mounted TrueCrypt volumes over a network. (Windows)
I am quite impressed that they did this so quickly, and am sad I did not find out until now. They also fixed the other missing feature I reported to them within a month of that [version 5.1]
- Support for hibernation on computers where the system partition is encrypted (previous versions of TrueCrypt prevented the system from hibernating when the system partition was encrypted). (Windows Vista/XP/2008/2003)
Also in the version history [5.1a], this little paragraph made me smile
- [Update 2008-04-02: Although we have not filed any complaint with Microsoft yet, we were contacted (on March 27) by Scott Field, a lead Architect in the Windows Client Operating System Division at Microsoft, who stated that he would like to investigate our requirements and look at possible solutions. We responded on March 31 providing details of the issues and suggested solutions.]
Other very important features they have added for version 6.0 that I am super happy about:
- Hidden operating systems, which is done in a really well way.
- Embedded backup header (located at the end of the volume)
- Up to 20% faster resuming from hibernation when the system partition/drive is encrypted. (As I have always been super frustrated by super slow hibernation resume support on my now abandoned partition encryption software suite, BestCrypt.)
- Multithreading support (Faster parallel processing, yay)
I did some speed tests of hibernation support in XP
and got the following numbers: (Results are averages of at least 5 tests, in seconds)
*VMWare was running with 256MB of RAM and 1 virtual CPU on Laptop**. VMWare results were not always stable due to other processes on the host machine, so I terminated the worst offenders
|VMWare* w/ no encryption||~5.0||~6.1|
|VMWare* w/ TrueCrypt 6.0 full drive encryption||~7.5||~11|
|VMWare* w/ TrueCrypt 6.0 decoy & dummy encryption||~7.3||~13.2|
|Laptop** w/ no encryption||~12.8||4.8|
|Laptop** w/ BestCrypt Volume Encryption||~92.1||~16.6|
|Laptop** w/ TrueCrypt 6.0 full drive encryption||~12.5||~13.9|
|Laptop** w/ TrueCrypt 6.0 decoy & dummy encryption||-||-|
**Laptop is a 2.4ghz Pentium Core Duo with 2GB RAM and 60GB hard drive running at 7200RPM
... The hidden operating system
feature really excited me. Unfortunately, the documentation on it is quite cryptic itself, so I thought I’d try explaining it myself.
TrueCrypt hidden OS diagram taken from http://www.truecrypt.org/docs/hidden-operating-system.php on 7/5/2008 and belongs to TrueCrypt
The decoy (first) partition holds a decoy OS and is accessible from the password prompt (password #3) at bootup. You should not have any sensitive data in it, and can give out the password if need be. TrueCrypt recommends using this decoy OS at least as much as the hidden OS so if someone checks out the decoy they are not suspicious of it. If the perpetrator is suspicious of the decoy due to non use, the size of the partition, or just the fact that you have TrueCrypt installed, you may need to fall back onto the second stage of the security in the below paragraph.
The outer (second) partition holds some decoy files and a hidden volume inside of it. It is accessible by either the decoy or hidden OS by opening the partition through a normal TrueCrypt device mounting (password #1). It is recommended to give out its password only if you have already been forced to mount your decoy OS and the perpetrator suspects a secure partition as is explained in the above paragraph. If any data is written to it after creation, it can destroy information at random within the Hidden OS (see “Partition Sizes” at the bottom).
The hidden partition holds its own OS and is hidden within the outer (second) partition. It is accessible from the password prompt (password #2) at bootup or by mounting the partition from TrueCrypt as a device when the decoy OS is open. The decoy partition/OS is NOT accessible while the hidden OS is open.
Basic installation procedure
- Create a computer with 2 partitions. The second (outer) partition must be 5% larger than the first (decoy) for a FAT file system, or 110% (2.1x) larger for a NTFS file system (see “Partition Sizes” at the bottom). You might as well make the outer partition FAT since it won’t be used much, if at all, and this won’t affect the hidden partition.
- Install your operating system on the first (decoy) partition with all of your applications and data that are not sensitive.
- Run the TrueCrypt hidden install, this does the following:
- Asks for outer volume password (Password #1). Creates and formats the second (outer) partition/volume.
- Lets you copy some “sensitive looking” files to the outer partition. Nothing should ever be changed or added to the outer partition after this, see “Partition Sizes” at the bottom.
- Asks for hidden volume password (Password #2). The hidden partition is created within the outer partition.
- Asks for decoy volume password (Password #3).
- Rescue disk is created
- All data from the first (decoy) partition is copied to the hidden partition, and then all data from the first (decoy) partition is encrypted.
And finally, things that bugged me, because I like to vent :-) :
- Forced creation of rescue disk on full volume encryption. Having the file is more than enough since it can be copied to other hard drives, but it wanted proof of the rescue disc creation, so I just mounted the ISO to a virtual drive.
- No customized pre-boot screens. This isn’t important really, but I loved my hokie ASCII art ^_^;.
- Partition sizes: The hidden OS partition will be the exact same size as the decoy and the outer partition must be at least 5% larger for FAT and 110% larger for NTFS than the decoy.
The hidden OS partition will be the exact size as the decoy partition because they are originally duplicates of each other, including their original partition tables, which include the size of the partition.
The outer (second) partition that holds the hidden partition must be at least 5% larger for FAT and 110% larger for NTFS than the decoy. The reason for this is the file contents tables. NTFS, unfortunately in this case, stores its file table in the middle of the partition. The outer partition’s file table does not, however, affect the hidden partition in any way.
So, for example (these numbers are theoretical, I am not entirely sure if these are correct), if we have a 2GB decoy partition, the outer NTFS partition must be at least 4.2GB and the hidden partition will be 2GB. If we made the outer partition 6GB, then 0-3GB would be writable, 3.0GB-3.6GB would be used for the file table, 3.6GB-4.0GB would be writable, and 4.0GB-6.0GB would be used by the hidden operating system. So, theoretically, you could write 3.4GB to the outer volume before problems started occurring, but I wouldn’t trust NTFS to only write to the beginning of the drive.