Home Page
Archive > Posts > 2016 > October
Search:

Weird compiler problem

I wanted to write about a really weird problem I recently had while debugging in C++ (technically, it’s all C). Unfortunately, I was doing this in kernel debugging mode, which made life a bit harder, but it would have happened the same in userland.

I had an .hpp file (we’ll call it process_internal.hpp) that was originally an internal file just to be included from a .cpp file (we’ll call it process.cpp), so it contained global variables as symbols. I ended up needing to include this process_internal.hpp file elsewhere (for testing, we’ll call it test.cpp). Because of this, the same symbol was included in multiple files, so the separate .o builds were not properly interacting. I ended up using “#ifdef”s to only include the parts I needed in the test.cpp file, and doing “extern” defines of the global variables for it. It looked something like the following:

enum { FT_Inbound, FT_Outbound };
typedef struct FilteringLayer {
	int FilterTypeNum, OriginalID;
	const char *Name;
} FilteringLayer;
const int FT_NumTypes=2;

#ifdef _PROCESS_INTERNAL
	FilteringLayer FilterTypes[FT_NumTypes]={
		{FT_Inbound,  5, "Inbound"),
		{FT_Outbound, 8, "Outbound"),
	};
#else
	extern "C" FilteringLayer *FilterTypes;
#endif

So I was accessing this variable in test.cpp and getting a really weird problem. The code looked something like this:

struct foo { int a, b; };
foo Stuff[]={...};
void FunctionBar()
{
	for(int i=0;i<FT_NumTypes;i++)
		Stuff[FilterTypes[i].OriginalID].b=1;
}

This was causing an access exception, which blue screened my debug VM. I tried running the exact same statements in the visual studio debugger, and things were working just as they were supposed to! So I decided to go to the assembly level. It looked something like this: (I included descriptions)

L#CodeDescriptionCombined description
for(int i=0;i<FT_NumTypes;i++)
1 mov qword ptr [rsp+58h],0 int i=0
2 jmp MODULENAME!FunctionBar+0xef JUMP TO #LINE@6
3 mov rax,qword ptr [rsp+58h] RAX=i
4 inc rax RAX++ i++
5 mov qword ptr [rsp+58h],rax I=RAX
6 cmp qword ptr [rsp+58h],02h CMP=(i-FT_NumTypes)
7 jae MODULENAME!FunctionBar+0x11e IF(CMP>=0) GOTO #LINE@15 if(i>=FT_NumTypes) GOTO #LINE@15
Stuff[FilterTypes[i].OriginalID].b=i;
8 imul rax,qword ptr [rsp+58h],10h RAX=i*sizeof(FilterTypes)
9 mov rcx,[MODULENAME!FilterTypes ]RCX=(void**)&FilterTypes
10movzx eax,word ptr [rcx+rax+4] RAX=((UINT16*)(RCX+RAX+4) RAX=((FilteringLayer*)&FilterType)[i].OriginalID
11imul rax,rax,30h RAX*=sizeof(foo)
12lea rcx,[MODULENAME!Stuff ] RCX=(void*)&Stuff
13mov dword ptr [rcx+rax+04h],1 *(UINT32*)(RCX+RAX+0x4)=1 Stuff[RAX].b=1
14jmp MODULENAME!FunctionBar+0xe2 GOTO #LINE@3
15...

I noticed that line #9 was putting 0x0000000C`00000000 into RCX instead of &FilterTypes. I knew the instruction should have been an “lea” instead of a “mov” to fix this. My first thought was compiler bug, but as many programming mantras say, that is very very rarely the case. If you want to guess now what the problem is, now is the time. I’ve given you all the information (and more) to make the guess.



The answer: extern "C" FilteringLayer *FilterTypes; should have been extern "C" FilteringLayer FilterTypes[];. Oops! The debugger was getting it right because it had the extra information of the real definition of the FilterTypes variable.

MySQL: Update multiple rows with different values
There are 3 different methods for updating multiple rows at once in MySQL with different values:
  1. INSERT: INSERT with ON DUPLICATE KEY UPDATE
    			INSERT INTO FooBar (ID, foo)
    			VALUES (1, 5), (2, 8), (3, 2)
    			ON DUPLICATE KEY UPDATE foo=VALUES(foo);
    		
  2. TRANSACTION: Where you do an update for each record within a transaction (InnoDB or other DBs with transactions)
    			START TRANSACTION;
    			UPDATE FooBar SET foo=5 WHERE ID=1;
    			UPDATE FooBar SET foo=8 WHERE ID=2;
    			UPDATE FooBar SET foo=2 WHERE ID=3;
    			COMMIT;
    		
  3. CASE: In which you a case/when for each different record within an UPDATE
    			UPDATE FooBar SET foo=CASE ID
    				WHEN 1 THEN 5
    				WHEN 2 THEN 8
    				WHEN 3 THEN 2
    			END
    			WHERE ID IN (1,2,3);
    		

I feel knowing the speeds of the 3 different methods is important.

All of the following numbers apply to InnoDB.


I just tested this, and the INSERT method was 6.7x faster for me than the TRANSACTION method. I tried on a set of both 3,000 and 30,000 rows and got the same results.


The TRANSACTION method still has to run each individually query, which takes time, though it batches the results in memory, or something, while executing. The TRANSACTION method is also pretty expensive in both replication and query logs.


Even worse, the CASE method was 41.1x slower than the INSERT method w/ 30,000 records (6.1x slower than TRANSACTION). And 75x slower in MyISAM. INSERT and CASE methods broke even at ~1,000 records. Even at 100 records, the CASE method is BARELY faster.


So in general, I feel the INSERT method is both best and easiest to use. The queries are smaller and easier to read and only take up 1 query of action. This applies to both InnoDB and MyISAM.


Bonus stuff:

Using the INSERT method, there can be a problem in which NON-NULL fields with no default (in other words, required fields) are not being updated. You will get an error like “Field 'fieldname' doesn't have a default value”. The solution is to temporarily turn off STRICT_TRANS_TABLES and STRICT_ALL_TABLES in the SQL mode: SET SESSION sql_mode=REPLACE(REPLACE(@@SESSION.sql_mode,"STRICT_TRANS_TABLES",""),"STRICT_ALL_TABLES",""). Make sure to save the sql_mode first if you plan on reverting it.


As for other comments I’ve seen that say the auto_increment goes up using the INSERT method, I tested that too and it seems to not be the case.


Code to run the tests is as follows: (It also outputs .SQL files to remove PHP interpreter overhead)
<?
//Variables
$NumRows=30000;

//These 2 functions need to be filled in
function InitSQL()
{

}
function RunSQLQuery($Q)
{

}

//Run the 3 tests
InitSQL();
for($i=0;$i<3;$i++)
    RunTest($i, $NumRows);

function RunTest($TestNum, $NumRows)
{
    $TheQueries=Array();
    $DoQuery=function($Query) use (&$TheQueries)
    {
        RunSQLQuery($Query);
        $TheQueries[]=$Query;
    };

    $TableName='Test';
    $DoQuery('DROP TABLE IF EXISTS '.$TableName);
    $DoQuery('CREATE TABLE '.$TableName.' (i1 int NOT NULL AUTO_INCREMENT, i2 int NOT NULL, primary key (i1)) ENGINE=InnoDB');
    $DoQuery('INSERT INTO '.$TableName.' (i2) VALUES ('.implode('), (', range(2, $NumRows+1)).')');

    if($TestNum==0)
    {
        $TestName='Transaction';
        $Start=microtime(true);
        $DoQuery('START TRANSACTION');
        for($i=1;$i<=$NumRows;$i++)
            $DoQuery('UPDATE '.$TableName.' SET i2='.(($i+5)*1000).' WHERE i1='.$i);
        $DoQuery('COMMIT');
    }

    if($TestNum==1)
    {
        $TestName='Insert';
        $Query=Array();
        for($i=1;$i<=$NumRows;$i++)
            $Query[]=sprintf("(%d,%d)", $i, (($i+5)*1000));
        $Start=microtime(true);
        $DoQuery('INSERT INTO '.$TableName.' VALUES '.implode(', ', $Query).' ON DUPLICATE KEY UPDATE i2=VALUES(i2)');
    }

    if($TestNum==2)
    {
        $TestName='Case';
        $Query=Array();
        for($i=1;$i<=$NumRows;$i++)
            $Query[]=sprintf('WHEN %d THEN %d', $i, (($i+5)*1000));
        $Start=microtime(true);
        $DoQuery("UPDATE $TableName SET i2=CASE i1\n".implode("\n", $Query)."\nEND\nWHERE i1 IN (".implode(',', range(1, $NumRows)).')');
    }

    print "$TestName: ".(microtime(true)-$Start)."<br>\n";

    file_put_contents("./$TestName.sql", implode(";\n", $TheQueries).';');
}