Home Page
Archive > Posts > Tags > Assembly
Search:

Weird compiler problem

I wanted to write about a really weird problem I recently had while debugging in C++ (technically, it’s all C). Unfortunately, I was doing this in kernel debugging mode, which made life a bit harder, but it would have happened the same in userland.

I had an .hpp file (we’ll call it process_internal.hpp) that was originally an internal file just to be included from a .cpp file (we’ll call it process.cpp), so it contained global variables as symbols. I ended up needing to include this process_internal.hpp file elsewhere (for testing, we’ll call it test.cpp). Because of this, the same symbol was included in multiple files, so the separate .o builds were not properly interacting. I ended up using “#ifdef”s to only include the parts I needed in the test.cpp file, and doing “extern” defines of the global variables for it. It looked something like the following:

enum { FT_Inbound, FT_Outbound };
typedef struct FilteringLayer {
	int FilterTypeNum, OriginalID;
	const char *Name;
} FilteringLayer;
const int FT_NumTypes=2;

#ifdef _PROCESS_INTERNAL
	FilteringLayer FilterTypes[FT_NumTypes]={
		{FT_Inbound,  5, "Inbound"),
		{FT_Outbound, 8, "Outbound"),
	};
#else
	extern "C" FilteringLayer *FilterTypes;
#endif

So I was accessing this variable in test.cpp and getting a really weird problem. The code looked something like this:

struct foo { int a, b; };
foo Stuff[]={...};
void FunctionBar()
{
	for(int i=0;i<FT_NumTypes;i++)
		Stuff[FilterTypes[i].OriginalID].b=1;
}

This was causing an access exception, which blue screened my debug VM. I tried running the exact same statements in the visual studio debugger, and things were working just as they were supposed to! So I decided to go to the assembly level. It looked something like this: (I included descriptions)

L#CodeDescriptionCombined description
for(int i=0;i<FT_NumTypes;i++)
1 mov qword ptr [rsp+58h],0 int i=0
2 jmp MODULENAME!FunctionBar+0xef JUMP TO #LINE@6
3 mov rax,qword ptr [rsp+58h] RAX=i
4 inc rax RAX++ i++
5 mov qword ptr [rsp+58h],rax I=RAX
6 cmp qword ptr [rsp+58h],02h CMP=(i-FT_NumTypes)
7 jae MODULENAME!FunctionBar+0x11e IF(CMP>=0) GOTO #LINE@15 if(i>=FT_NumTypes) GOTO #LINE@15
Stuff[FilterTypes[i].OriginalID].b=i;
8 imul rax,qword ptr [rsp+58h],10h RAX=i*sizeof(FilterTypes)
9 mov rcx,[MODULENAME!FilterTypes ]RCX=(void**)&FilterTypes
10movzx eax,word ptr [rcx+rax+4] RAX=((UINT16*)(RCX+RAX+4) RAX=((FilteringLayer*)&FilterType)[i].OriginalID
11imul rax,rax,30h RAX*=sizeof(foo)
12lea rcx,[MODULENAME!Stuff ] RCX=(void*)&Stuff
13mov dword ptr [rcx+rax+04h],1 *(UINT32*)(RCX+RAX+0x4)=1 Stuff[RAX].b=1
14jmp MODULENAME!FunctionBar+0xe2 GOTO #LINE@3
15...

I noticed that line #9 was putting 0x0000000C`00000000 into RCX instead of &FilterTypes. I knew the instruction should have been an “lea” instead of a “mov” to fix this. My first thought was compiler bug, but as many programming mantras say, that is very very rarely the case. If you want to guess now what the problem is, now is the time. I’ve given you all the information (and more) to make the guess.



The answer: extern "C" FilteringLayer *FilterTypes; should have been extern "C" FilteringLayer FilterTypes[];. Oops! The debugger was getting it right because it had the extra information of the real definition of the FilterTypes variable.

Progress on Windows Logon Background Hacking
AND THEN I HOOKED THE MIGHTY OPCODE, AND SMOTE IT TO RUIN WITH INTERRUPTS AND NOTS!
Since my last post, I’ve made some progress in my spare time on working out how the windows logon screen displays the background. I’ve gotten to try out a lot of new tools, since I haven’t done anything like this in over 10 years, and quite frankly, I’m mostly disappointed. All the tools I already had seem to do the job better than anything new I could find (though, granted, I need to learn WinDBG more). So at this point I can debug the dll, while it’s running live in Windows, via the following process:
  1. Run the Windows Terminal Services hack (WinXP version explanation) so you can have multiple desktops running at once on the virtual machine, which makes things a little easier, but is not necessary.
  2. Make a backup of C:\Windows\System32\LogonUI.exe and made it editable (see previous post for risks and further info).
  3. Add an INT3 interrupt breakpoint near the beginning of LogonUI.exe. I just changed the first conditional jump in the dll startup code that is supposed to fail to an INT3, padded with NOPs.
  4. Set OllyDbg as the JIT debugger, so whenever LoginUI.exe is run and hits the interrupt, it automatically spawns OllyDbg and is attached.
  5. Tell windows to lock itself (Start>Shut Down>Lock [if available] or Win+L).
  6. As soon as OllyDbg is spawned with LoginUI.exe attached, also attach winlogon.exe in another debugger and keep it paused so it doesn’t keep trying to respawn LoginUI.exe when your attached copy doesn’t respond.

It would be nice if I could find an [easy] way to make a spawned process automatically go into my debugger without the need to add an interrupt, especially to a remote debugger, but oh well.

So my plan of action after this is to:
  1. Get the handle or memory location of where the image is stored by monitoring the GDI calls made after the text reference to C:\Windows\System32\oobe\info\backgrounds\backgroundDefault.jpg .
  2. Put a hardware breakpoint on that memory location/handle and find where it is used to draw to the screen background.
  3. At that point, the GDI call could be manipulated to not shrink the image to the primary monitor (easy), or multiple GDI calls could be made to each monitor for all the images (much harder).
The image might actually be shrunk before it is stored in memory too, though from what I’ve gleamed from the disassembly so far, I do not believe this to be the case.

More to come if and when I make progress.
Overcoming the 250KB Windows Login Background Cap

I had the need this year to upgrade to a 6+1 monitor setup for some of the work I’ve been doing.

Home Office 1

Home Office 2

It took me a bit to get everything how I wanted, using Display Fusion for multi monitor control, and a customized version of Window Manager for organizing window positioning. I am very happy with the final result.

However, there was one minor annoyance I decided to tackle as a fun get-back-into-reverse-engineering project (it’s been years since I’ve done any real fun programming, which saddens me greatly). When in the lock/logon screen for Windows 7, only one monitor can show a background, and that background must be limited to a filesize of 250KB, which can greatly reduce the quality of the image.

The C:\Windows\System32\authui.dll controls the lock screen behavior, so it is to this file I looked to for the solutions. Before I go on, there are 2 very important notes I should make:

  1. It can be very dangerous to modify system DLLs. This could crash your operating system, or even make it not able to load! Always backup the files you are modifying first, and make sure you are comfortable with restoring them somehow (most likely using a separate operating system like a Linux Boot CD).
  2. You need to make sure you are actually editing the right file when you open it up. While the file you want will always be in c:\Windows\System32, on 64-bit windows machines there is also a directory at c:\Windows\SysWOW64 that contains a 32-bit version of the file. (Brilliant naming scheme Microsoft! 32 bit files in the “64” directory and vice versa). Depending on the software you are using, sometimes when you try to access the authui.dll in the System32 directory (~1.84MB), it actually modifies the file in SysWOW64 (~1.71MB) using obfuscated Windows magic.

After a little bit of playing, so far I’ve solved the 250KB size limitation, and I plan on continuing to tinker with it a bit more until the other is solved too. To start, you will need to give yourself file system access to modify the c:\Windows\System32\authui.dll file. To do so, go into the file’s property page, change the owner to yourself, and then give appropriate user permissions so you can modify it as you see fit.

Open the authui.dll in your favorite hex editor and replace:
41 B9 00 E8 03 00
with
41 B9 FF FF FF 00
this essentially changes the size cap to ~16MB. However, I haven’t tested anything larger than 280KB yet. There is possibly a size limitation somewhere that may be dangerous to breech, but from what I gleam from the code; I do not think this is the case.

What this change actually does is update the 256,000 value to (2^24)-1 in the following code:
jmp __imp_GetFileSize
41 B9 00 E8 03 00mov r9d, 3E800h
41 3B C1cmp eax, r9d
jnb short loc_xxx

It’s been a bit tedious working on the assembly code of the authui.dll, as my favorite disassembler/debugger (ollydbg) does not work with 64-bit files, and I am not very comfortable with other dissasemblers I have tried. :-\ Alas. Hopefully more coming soon on this topic.

NULL Pointer for C++
Extending a language for what it’s lacking

I’ve recently been frustrated by the fact that NULL in C++ is actually evaluated as integral 0 instead of a pointer with the value of 0. This problem can be seen in the following example:

class String
{
	String(int i)  { /* ... */ } //Convert a number to a string
	String(char* i){ /* ... */ } //Copy a char* string directly into the class
};

String Foo(NULL); //This would give the string "Foo" the value "0" instead of a char* to (void*)0

The solution I came up with, which my good friend Will Erickson (aka Sarev0k) helped me revise, is as follows:

#undef NULL //If NULL is already defined, get rid of it
struct NULL_STRUCT { template <typename T> operator T*() { return (T*)0; } }; //NULL_STRUCT will return 0 to any pointer
static NULL_STRUCT NULL; //NULL is of type NULL_STRUCT and static (local to the current file)

After coming up with this way of doing it, I found out this concept is already a part of the new C++0x standard as nullptr, but since it is not really out yet, I still need a solution for the current C++ standard.


After getting this to work how I wanted it, I tested it out to make sure it is optimized correctly in compilers. When the compiler knows a value will be 0, it can apply lots of special assembly tricks.

Microsoft Visual C++ got it right by seeing that NULL was just 0 and applying appropriate optimizations, but GCC missed an optimization step and didn’t detect that it was 0 down the whole pipe. GCC, to my knowledge, however, isn’t exactly known for its optimization.


Example code:
BYTE* a=...; //Set a to an arbitrary value (best if brought in via an external method [i.e. stdin] so the compiler doesn’t make assumptions about the variable)
bool b=(a==NULL); //Set to b if a is 0 (NULL)
What MSVC6 outputs (and what it should be after optimization):
test eax,eax	//logical and a against itself to determine if it is 0 or not
sete al		//Set the lowest byte of eax to 1 if a is 0
What GCC gives
xor edx,edx	//Temporarily store 0 in edx for later comparison. This is a 0 trick, but 1 step higher than it could be used at.
cmp edx,eax	//Compare a against edx (0)
sete al		//Set the lowest byte of eax to 1 if a equals the value in edx

On a side note, it has been quite painful going from using assembly in Microsoft Visual C++ to GCC for 2 reasons:
  • I hate AT&T (as opposed to Intel) assembly syntax. It is rather clunky to use, and every program I’ve ever used is in Intel syntax (including all the Intel reference documentation). I tried turning on Intel syntax through a flag when compiling through GCC, but it broke GCC. :-\
  • Having to list which assembly registers are modified/used in the extended assembly syntax. This interface is also very clunky and, I have found, prone to bugs and problems.
OllyDbg 2.0
Reverse engineering is fun! :-D

OllyDbg is my favorite assembly editing environment for reverse engineering applications in Windows. I used it for all of my Ragnarok Online projects in 2002, and you can find a tutorial that uses it here (sorry, the writing in it is horrible x.x; ).

Ever since I started using it back then, the author was talking about his complete rewrite of the program, dubbed version 2.0, that was supposedly going to be much, much better. I have been patiently waiting for it ever since :-). Rather randomly, I decided to check back on the website yesterday, after not having visiting there for over a year, and low and behold, the first beta of version 2.0 [self-mirror] was released yesterday! :-D. Unfortunately, I’m not really doing any reverse engineering or assembly level work right now, so I have no reason or need to test it :-\.


... So yes, just wanted to call attention to this wonderful program being updated, that’s all for today!

C Jump Tables
The unfortunate reality of different feature sets in different language implementations

I was thinking earlier today how it would be neat for C/C++ to be able to get the address of a jump-to label to be used in jump tables, specifically, for an emulator. A number of seconds after I did a Google query, I found out it is possible in gcc (the open source native Linux compiler) through the “label value operator” “&&”. I am crushed that MSVC doesn’t have native support for such a concept :-(.

The reason it would be great for an emulator is for emulating the CPU, in which, usually, each first byte of a CPU instruction’s opcode [see ASM] gives what the instruction is supposed to do. An example to explain the usefulness of a jump table is as follows:

void DoOpcode(int OpcodeNumber, ...)
{
	void *Opcodes[]={&&ADD, &&SUB, &&JUMP, &&MUL}; //assuming ADD=opcode 0 and so forth
	goto *Opcodes[OpcodeNumber];
  	ADD:
		//...
	SUB:
		//...
	JUMP:
		//...
	MUL:
		//...
}

Of course, this could still be done with virtual functions, function pointers, or a switch statement, but those are theoretically much slower. Having them in separate functions would also remove the possibility of local variables.

Although, again, theoretically, it wouldn’t be too bad to use, I believe, the _fastcall function calling convention with function pointers, and modern compilers SHOULD translate switches to jump tables in an instance like this, but modern compilers are so obfuscated you never know what they are really doing.

It would probably be best to try and code such an instance so that all 3 methods (function pointers, switch statement, jump table) could be utilized through compiler definitions, and then profile for whichever method is fastest and supported.

//Define the switch for which type of opcode picker we want
#define UseSwitchStatement
//#define UseJumpTable
//#define UseFunctionPointers

//Defines for how each opcode picker acts
#if defined(UseSwitchStatement)
	#define OPCODE(o) case OP_##o:
#elif defined(UseJumpTable)
	#define OPCODE(o) o:
	#define GET_OPCODE(o) &&o
#elif defined(UseFunctionPointers)
	#define OPCODE(o) void Opcode_##o()
	#define GET_OPCODE(o) (void*)&Opcode_##o
	//The above GET_OPCODE is actually a problem since the opcode functions aren't listed until after their ...
	//address is requested, but there are a couple of ways around that I'm not going to worry about going into here.
#endif

enum {OP_ADD=0, OP_SUB}; //assuming ADD=opcode 0 and so forth
void DoOpcode(int OpcodeNumber, ...)
{
	#ifndef UseSwitchStatement //If using JumpTable or FunctionPointers we need an array of the opcode jump locations
		void *Opcodes[]={GET_OPCODE(ADD), GET_OPCODE(SUB)}; //assuming ADD=opcode 0 and so forth
	#endif
	#if defined(UseSwitchStatement)
		switch(OpcodeNumber) { //Normal switch statement
	#elif defined(UseJumpTable)
		goto *Opcodes[OpcodeNumber]; //Jump to the proper label
	#elif defined(UseFunctionPointers)
		*(void(*)(void))Opcodes[OpcodeNumber]; //Jump to the proper function
		} //End the current function
	#endif

	//For testing under "UseFunctionPointers" (see GET_OPCODE comment under "defined(UseFunctionPointers)")
	//put the following OPCODE sections directly above this "DoOpcode" function
	OPCODE(ADD)
	{
		//...
	}
	OPCODE(SUB)
	{
		//...
	}

	#ifdef UseSwitchStatement //End the switch statement
	}
	#endif

#ifndef UseFunctionPointers //End the function
}
#endif

After some tinkering, I did discover through assembly insertion it was possible to retrieve the offset of a label in MSVC, so with some more tinkering, it could be utilized, though it might be a bit messy.
void ExamplePointerRetreival()
{
	void *LabelPointer;
	TheLabel:
	_asm mov LabelPointer, offset TheLabel
}