Home Page
Archive > Posts > Tags > Programming hacks

Monitoring PHP calls

I recently had a Linux client that was, for whatever odd reason, making infinite recursive HTTP calls to a single script, which was making the server process count skyrocket. I decided to use the same module as I did in my Painless migration from PHP MySQL to MySQLi post, which is to say, overriding base functions for fun and profit using the PHP runkit extension. I did this so I could gather, for debugging, logs of when and where the calls that were causing this to occur.

The below code overrides all functions listed on the line that says “List of functions to intercept” [Line 9]. It works by first renaming these built in functions to “OVERRIDE_$FuncName[Line 12], and replacing them with a call to “GlobalRunFunc()” [Line 13], which receives the original function name and argument list. The GlobalRunFunc():

  1. Checks to see if it is interested in logging the call
    • In the case of this example, it will log the call if [Line 20]:
      • Line 21: curl_setopt is called with the CURLOPT_URL parameter (enum=10002)
      • Line 22: curl_init is called with a first parameter, which would be a URL
      • Line 23: file_get_contents or fopen is called and is not an absolute path
        (Wordpress calls everything absolutely. Normally I would have only checked for http[s] calls).
    • If it does want to log the call, it stores it in a global array (which holds all the calls we will want to log).
      The logged data includes [Line 25]:
      • The function name
      • The function parameters
      • 2 functions back of backtrace (which can often get quite large when stored in the log file)
  2. It then calls the original function, with parameters intact, and passes through the return [Line 27].

The “GlobalShutdown()” [Line 30] is then called when the script is closing [Line 38] and saves all the logs, if any exist, to “$GlobalLogDir/$DATETIME.srl”.

I have it using “serialize()” to encode the log data [Line 25], as opposed to “json_encode()” or “print_r()” calls, as the latter were getting too large for the logs. You may want to have it use one of these other encoding functions for easier log perusal, if running out of space is not a concern.

//The log data to save is stored here
global $GlobalLogArr, $GlobalLogDir;

//Override the functions here to instead have them call to GlobalRunFunc, which will in turn call the original functions
        'fopen', 'file_get_contents', 'curl_init', 'curl_setopt', //List of functions to intercept
) as $FuncName)
        runkit_function_rename($FuncName, "OVERRIDE_$FuncName");
        runkit_function_add($FuncName, '', "return GlobalRunFunc('$FuncName', func_get_args());");

//This optionally 
function GlobalRunFunc($FuncName, $Args)
        global $GlobalLogArr;
                ($FuncName=='curl_setopt' && $Args[1]==10002) || //CURLOPT enumeration can be found at https://curl.haxx.se/mail/archive-2004-07/0100.html
                ($FuncName=='curl_init' && isset($Args[0])) ||
                (($FuncName=='file_get_contents' || $FuncName=='fopen') && $Args[0][0]!='/')
                $GlobalLogArr[]=serialize(Array('FuncName'=>$FuncName, 'Args'=>$Args, 'Trace'=>array_slice(debug_backtrace(), 1, 2)));

        return call_user_func_array("OVERRIDE_$FuncName", $Args);

function GlobalShutdown()
        global $GlobalLogArr, $GlobalLogDir;
                file_put_contents($GlobalLogDir.date('Y-m-d_H:i:s.'.substr($Time-floor($Time), 2, 3), floor($Time)).'.srl', implode("\n", $GlobalLogArr));

Painless migration from PHP MySQL to MySQLi

The PHP MySQL extension is being deprecated in favor of the MySQLi extension in PHP 5.5, and removed as of PHP 7.0. MySQLi was first referenced in PHP v5.0.0 beta 4 on 2004-02-12, with the first stable release in PHP 5.0.0 on 2004-07-13[1]. Before that, the PHP MySQL extension was by far the most popular way of interacting with MySQL on PHP, and still was for a very long time after. This website was opened only 2 years after the first stable release!

With the deprecation, problems from some websites I help host have popped up, many of these sites being very, very old. I needed a quick and dirty solution to monkey-patch these websites to use MySQLi without rewriting all their code. The obvious answer is to overwrite the functions with wrappers for MySQLi. The generally known way of doing this is with the Advanced PHP Debugger (APD). However, using this extension has a lot of requirements that are not appropriate for a production web server. Fortunately, another extension I recently learned of offers the renaming functionality; runkit. It was a super simple install for me.

  1. From the command line, run “pecl install runkit”
  2. Add “extension=runkit.so” and “runkit.internal_override=On” to the php.ini

Besides the ability to override these functions with wrappers, I also needed a way to make sure this file was always loaded before all other PHP files. The simple solution for that is adding “auto_prepend_file=/PATH/TO/FILE” to the “.user.ini” in the user’s root web directory.

The code for this script is as follows. It only contains a limited set of the MySQL functions, including some very esoteric ones that the web site used. This is not a foolproof script, but it gets the job done.

//Override the MySQL functions
    'connect', 'error', 'fetch_array', 'fetch_row', 'insert_id', 'num_fields', 'num_rows',
    'query', 'select_db', 'field_len', 'field_name', 'field_type', 'list_dbs', 'list_fields',
    'list_tables', 'tablename'
) as $FuncName)
    runkit_function_redefine("mysql_$FuncName", '',
        'return call_user_func_array("mysql_'.$FuncName.'_OVERRIDE", func_get_args());');

//If a connection is not explicitely passed to a mysql_ function, use the last created connection
global $SQLLink; //The remembered SQL Link
function GetConn($PassedConn)
        return $PassedConn;
    global $SQLLink;
    return $SQLLink;

//Override functions
function mysql_connect_OVERRIDE($Host, $Username, $Password) {
    global $SQLLink;
    return $SQLLink=mysqli_connect($Host, $Username, $Password);
function mysql_error_OVERRIDE($SQLConn=NULL) {
    return mysqli_error(GetConn($SQLConn));
function mysql_fetch_array_OVERRIDE($Result, $ResultType=MYSQL_BOTH) {
    return mysqli_fetch_array($Result, $ResultType);
function mysql_fetch_row_OVERRIDE($Result) {
    return mysqli_fetch_row($Result);
function mysql_insert_id_OVERRIDE($SQLConn=NULL) {
    return mysqli_insert_id(GetConn($SQLConn));
function mysql_num_fields_OVERRIDE($Result) {
    return mysqli_num_fields($Result);
function mysql_num_rows_OVERRIDE($Result) {
    return mysqli_num_rows($Result);
function mysql_query_OVERRIDE($Query, $SQLConn=NULL) {
    return mysqli_query(GetConn($SQLConn), $Query);
function mysql_select_db_OVERRIDE($DBName, $SQLConn=NULL) {
    return mysqli_select_db(GetConn($SQLConn), $DBName);
function mysql_field_len_OVERRIDE($Result, $Offset) {
    return $Fields[$Offset]->length;
function mysql_field_name_OVERRIDE($Result, $Offset) {
    return $Fields[$Offset]->name;
function mysql_field_type_OVERRIDE($Result, $Offset) {
    return $Fields[$Offset]->type;
function mysql_list_dbs_OVERRIDE($SQLConn=NULL) {
    $Result=mysql_query('SHOW DATABASES', GetConn($SQLConn));
    return $Tables;
function mysql_list_fields_OVERRIDE($DBName, $TableName, $SQLConn=NULL) {
    $CurDB=mysql_fetch_array(mysql_query('SELECT Database()', $SQLConn));
    mysql_select_db($DBName, $SQLConn);
    $Result=mysql_query("SHOW COLUMNS FROM $TableName", $SQLConn);
    mysql_select_db($CurDB, $SQLConn);
    if(!$Result) {
        print 'Could not run query: '.mysql_error($SQLConn);
        return Array();
    return $Fields;
function mysql_list_tables_OVERRIDE($DBName, $SQLConn=NULL) {
    $CurDB=mysql_fetch_array(mysql_query('SELECT Database()', $SQLConn));
    mysql_select_db($DBName, $SQLConn);
    $Result=mysql_query("SHOW TABLES", $SQLConn);
    mysql_select_db($CurDB, $SQLConn);
    if(!$Result) {
        print 'Could not run query: '.mysql_error($SQLConn);
        return Array();
    return $Tables;
function mysql_tablename_OVERRIDE($Result) {
    return $Fields[0]->table;

And here is some test code to confirm functionality:
global $MyConn, $TEST_Table;
function GetResult() {
    global $MyConn, $TEST_Table;
    return mysql_query('SELECT * FROM '.$TEST_Table.' LIMIT 1', $MyConn);
var_dump($MyConn=mysql_connect($TEST_Server, $TEST_UserName, $TEST_Password));
//Set $MyConn to NULL here if you want to test global $SQLLink functionality
var_dump(mysql_select_db($TEST_DB, $MyConn));
var_dump(mysql_query('SELECT * FROM INVALIDTABLE LIMIT 1', $MyConn));
$Result=GetResult(); var_dump(mysql_fetch_row($Result));
$Result=GetResult(); var_dump(mysql_num_fields($Result));
var_dump(mysql_field_len($Result, 0));
var_dump(mysql_field_name($Result, 0));
var_dump(mysql_field_type($Result, 0));
var_dump(mysql_list_fields($TEST_DB, $TEST_Table, $MyConn));
var_dump(mysql_list_tables($TEST_DB, $MyConn));
mysql_query('CREATE TEMPORARY TABLE mysqltest (i int auto_increment, primary key (i))', $MyConn);
mysql_query('INSERT INTO mysqltest VALUES ()', $MyConn);
mysql_query('INSERT INTO mysqltest VALUES ()', $MyConn);
mysql_query('DROP TEMPORARY TABLE mysqltest', $MyConn);
Cross site scripting solutions
When you are forced to break the security model

So I was recently hired to set up a go-between system that would allow two independent websites to directly communicate and transfer/copy data between each other via a web browser. This is obviously normally not possible due to cross-site browser security settings (XSS), so I gave the client 2 possible solutions. Both of these solutions are written with the assumption that there is a go-between intermediary iframe/window, on a domain that they control, between the 2 independent site iframes/window. This would also work fine for one site you control against a site you do not control.

  1. Tell the browser to ignore this security requirement:
    • For example, if you add to the chrome command line arguments “--disable-web-security”, cross-site security checks will be removed. However, chrome will prominently display on the very first tab (which can be closed) at the top of the browser “You are using an unsupported command-line flag: —disable-web-security. Stability and security will suffer”. This can be scary to the user, and could also allow security breaches if the user utilizes that browser [session] for anything except the application page.
  2. The more appropriate way to do it, which requires a bit of work on the administrative end, is having all 3 sites pretend to run off of the same domain. To do this:
    1. You must have a domain that you control, which we will call UnifyingDomain.com (This top level domain can contain subdomains)
    2. The 2 sites that YOU control would need a JavaScript line of  “document.domain='UnifyingDomain.com';” somewhere in them. These 2 sites must also be run off of a subdomain of UnifyingDomain.com, (which can also be done through apache redirect directives).
    3. The site that you do not control would need to be forwarded through your UnifyingDomain.com (not a subdomain) via an apache permanent redirect.
      • This may not work, if their site programmer is dumb and does not use proper relative links for everything (absolute links are the devil :-) ). If this is the case:
        • You can use a [http] proxy to pull in their site through your domain (in which case, if you wanted, you could inject a “domain=”)
        • You can use the domain that you do not control as the top level UnifyingDomain.com, and add rules into your computer’s hostname files to redirect its subdomains to your IPs.

This project is why I ended up making my HTTP Forwarders client in go (coming soon).

iGoogle Security Problems
For a company that stresses security...

I’ve recently been having problems using the Google Reader widget in iGoogle. Normally, when I clicked on an RSS Title, a “bubble” popped up with the post’s content. However recently when clicking on the titles, the original post’s source opened up in a new tab. I confirmed the settings for the widget were correct, so I tried to remember the last change I made in Firefox that could have triggered this problem, as it seems the problem was not widespread, and only occurred to a few other people with no solution found. I realized a little bit back that I had installed the HTTPS Everywhere Firefox plugin. As described on the EFF’s site “HTTPS Everywhere is a Firefox extension ... [that] encrypts your communications with a number of major websites”.

Once I disabled the plugin and found the problem went away, I started digging through Google’s JavaScript code with FireBug. It turns out the start of the problem was that the widgets in iGoogle are run in their own IFrames (which is a very secure way of doing a widget system like this). However, the Google Reader contents was being pulled in through HTTPS secure channels (as it should thanks to HTTPS Everywhere), while the iGoogle page itself was pulled in through a normal HTTP channel! Separate windows/frames/tabs cannot interact with each other through JavaScript if they are not part of the same domain and protocol (HTTP/HTTPS) to prevent Cross-site scripting hacks.

I was wondering why HTTPS Everywhere was not running iGoogle through an HTTPS channel, so I tried it myself and found out Google automatically redirects HTTPS iGoogle requests to non secure HTTP channels! So much for having a proper security model in place...

So I did a lot more digging and modifying of Google’s code to see if I couldn’t find out exactly where the problem was occurring and if it couldn’t be fixed with a hack. It seems the code to handle the RSS Title clicking is injected during the “onload” event of the widget’s IFrame. I believe this was the code that was hitting the security privilege error to make things not work. I attempted to hijack the Google Reader widget’s onload function and add special privileges using “netscape.security.PrivilegeManager.enablePrivilege”, but it didn’t seem to help the problem. I think with some more prodding I could have gotten it working, but I didn’t want to waste any more time than I already had on the problem.

The code that would normally be loaded into the widget’s IFrame window hooks the “onclick” event of all RSS Title links to both perform the bubble action and cancel the normal “click” action. Since the normal click action for the anchor links was not being canceled, the browser action of following the link occurred. In this case, the links also had a “target” set to open a new window/tab.

There is however a “fix” for this problem, though I don’t find it ideal. If you edit the “extensions\https-everywhere@eff.org\chrome\content\rules\GoogleServices.xml” file in your Firefox profile directory (most likely at “C:\Users\USERNAME\AppData\Roaming\Mozilla\Firefox\Profiles\PROFILENAME\” if running Windows 7), you can comment out or delete the following rule so Google Reader is no longer run through secure HTTPS channels:

<rule from="^http://(www\.)?google\.com/reader/" 

That being said, I’ve been having a plethora of problems with Facebook and HTTPS Everywhere too :-\ (which it actually mentions might happen in its options dialog). You’d think the largest sites on the Internet could figure out how to get their security right, but either they don’t care (the more likely option), or they don’t want the encryption overhead. Alas.

Executable Stubs
Win32 Executable Hacking

Executable stubs can be used by a compiler to create the header section (the beginning section) of an outputted executable by adding the “/stub” switch to the linker.

#pragma comment(linker, "/stub:Stub.exe")

The MSDN Library for MSVC6 has the following to say about it:

The MS-DOS Stub File Name (/STUB:filename) option attaches an MS-DOS stub program to a Win32 program.

A stub program is invoked if the file is executed in MS-DOS. It usually displays an appropriate message; however, any valid MS-DOS application can be a stub program.

Specify a filename for the stub program after a colon (:) on the command line. The linker checks filename to be sure that it is a valid MS-DOS executable file, and issues an error message if the file is not valid. The program must be an .EXE file; a .COM file is invalid for a stub program.

If this option is not used, the linker attaches a default stub program that issues the following message:
This program cannot be run in MS-DOS mode.

For the stub to work in XP, the following guidelines must be met:
  • The stub should be at least 64 bytes long
  • The first 2 bytes of the stub (Bytes 0-1) need to be “MZ”
  • Bytes 60-63 (4 bytes) are replaced by the compiler (I have a note that you might want to set this to 0x60000000 [big endian] for some reason)

As long as these guidelines are met, the rest of the stub can be whatever you want :-). For Small Projects, you can even put information here like strings for the executable, which are accessible through the executable virtual address space starting at 0x400000.
Language Optimization Techniques
A few tricks up the programmers sleeve

I’m gonna cheat today since it is really late, as I spent a good amount of time organizing the 3D Engines update which pushed me a bit behind, and I’m also exhausted. Instead of writing some more content, I’m just linking to the “Utilized Optimization Techniques” section of the 3D Engines project, which I put up today.

It describes 4 programming speed optimization tricks: Local variable assignment, precalculating index lookups, pointer transversing/addition, and loop unrolling. This project post also goes into some differences between the used languages [Flash, C++, and Java], especially when dealing with speed.

C Jump Tables
The unfortunate reality of different feature sets in different language implementations

I was thinking earlier today how it would be neat for C/C++ to be able to get the address of a jump-to label to be used in jump tables, specifically, for an emulator. A number of seconds after I did a Google query, I found out it is possible in gcc (the open source native Linux compiler) through the “label value operator” “&&”. I am crushed that MSVC doesn’t have native support for such a concept :-(.

The reason it would be great for an emulator is for emulating the CPU, in which, usually, each first byte of a CPU instruction’s opcode [see ASM] gives what the instruction is supposed to do. An example to explain the usefulness of a jump table is as follows:

void DoOpcode(int OpcodeNumber, ...)
	void *Opcodes[]={&&ADD, &&SUB, &&JUMP, &&MUL}; //assuming ADD=opcode 0 and so forth
	goto *Opcodes[OpcodeNumber];

Of course, this could still be done with virtual functions, function pointers, or a switch statement, but those are theoretically much slower. Having them in separate functions would also remove the possibility of local variables.

Although, again, theoretically, it wouldn’t be too bad to use, I believe, the _fastcall function calling convention with function pointers, and modern compilers SHOULD translate switches to jump tables in an instance like this, but modern compilers are so obfuscated you never know what they are really doing.

It would probably be best to try and code such an instance so that all 3 methods (function pointers, switch statement, jump table) could be utilized through compiler definitions, and then profile for whichever method is fastest and supported.

//Define the switch for which type of opcode picker we want
#define UseSwitchStatement
//#define UseJumpTable
//#define UseFunctionPointers

//Defines for how each opcode picker acts
#if defined(UseSwitchStatement)
	#define OPCODE(o) case OP_##o:
#elif defined(UseJumpTable)
	#define OPCODE(o) o:
	#define GET_OPCODE(o) &&o
#elif defined(UseFunctionPointers)
	#define OPCODE(o) void Opcode_##o()
	#define GET_OPCODE(o) (void*)&Opcode_##o
	//The above GET_OPCODE is actually a problem since the opcode functions aren't listed until after their ...
	//address is requested, but there are a couple of ways around that I'm not going to worry about going into here.

enum {OP_ADD=0, OP_SUB}; //assuming ADD=opcode 0 and so forth
void DoOpcode(int OpcodeNumber, ...)
	#ifndef UseSwitchStatement //If using JumpTable or FunctionPointers we need an array of the opcode jump locations
		void *Opcodes[]={GET_OPCODE(ADD), GET_OPCODE(SUB)}; //assuming ADD=opcode 0 and so forth
	#if defined(UseSwitchStatement)
		switch(OpcodeNumber) { //Normal switch statement
	#elif defined(UseJumpTable)
		goto *Opcodes[OpcodeNumber]; //Jump to the proper label
	#elif defined(UseFunctionPointers)
		*(void(*)(void))Opcodes[OpcodeNumber]; //Jump to the proper function
		} //End the current function

	//For testing under "UseFunctionPointers" (see GET_OPCODE comment under "defined(UseFunctionPointers)")
	//put the following OPCODE sections directly above this "DoOpcode" function

	#ifdef UseSwitchStatement //End the switch statement

#ifndef UseFunctionPointers //End the function

After some tinkering, I did discover through assembly insertion it was possible to retrieve the offset of a label in MSVC, so with some more tinkering, it could be utilized, though it might be a bit messy.
void ExamplePointerRetreival()
	void *LabelPointer;
	_asm mov LabelPointer, offset TheLabel
Inlining Executable Resources
Do you suffer from OPC (Obsessive Perfection Complex)? If not, you aren’t an engineer :-)

I am somewhat obsessive about file cleanliness, and like to have everything I do well organized with any superfluous files removed. This especially translates into my source code, and even more so for released source code.

Before I zip up the source code for any project, I always remove the extraneous workspace compilation files. These usually include:

  • C/C++: Debug & Release directories, *.ncb, *.plg, *.opt, and *.aps
  • VB: *.vbw
  • .NET: *.suo, *.vbproj.user

Unfortunately, a new offender surfaced in the form of the Hyrulean Productions icon and Signature File for about pages. I did not want to have to have every source release include those 2 extra files, so I did research into inlining them in the resource script (.rc) file. Resources are just data directly compiled into an executable, and the resource script tells the executable all of these resources and how to compile them in. All my C projects include a resource script for at least the file version, author information, and Hyrulean Productions icon. Anyways, this turned out to be way more of a pain in the butt than intended.

There are 2 ways to load “raw data” (not a standard format like an icon, bitmap, string table, version information, etc) into a resource script. The first way is through loading an external file:
for example:
RESOURCEID and RESOURCETYPE are arbitrary and user defined, and it should also be noted to usually have them in caps, as the compilers seem to often be picky about case.

The second way is through inlining the data:
for example:
DakSig	Sig
Getting the data in the right format for the resource script is a relatively simple task.
  • First, acquire the data in 16-bit encoded format (HEX). I suggest WinHex for this job.
    On a side note, I have been using WinHex for ages and highly recommend it. It’s one of the most well built and fully featured application suites I know if.
  • Lastly, convert the straight HEX DATA (“DA32CF2A0603...”) into an array of proper endian hex values (“0x32DA,0x2ACF,0x0306...”). This can be done with a global replace regular expression of “(..)(..)” to “0x$2$1,”. I recommend Editpad Pro for this kind of work, another of my favorite pieces of software. As a matter of fact, I am writing this post right now in it :-).

Here is where the real caveats and problems start falling into place. First, I noticed the resource data was corrupt for a few bytes at a certain location. It turned out to be Visual Studio wanting line lengths in the resource file to be less than ~4175 characters, so I just added a line break at that point.

This idea worked great for the about page signature, which needed to be raw data anyways, but encoding the icon this way turned out to be impossible :-(. Visual Studio apparently requires external files be loaded if you want to use a pre-defined binary resource type (ICON, BITMAP, etc). The simple solution would be to inline the icon as a user defined raw data type, but unfortunately, the Win32 icon loading API functions (LoadIcon, CreateIconFromResource, LoadImage, etc) only seemed to work with properly defined ICONs. I believe the problem here is that when the compiler loads in the icon to include in the executable, it reformats it somewhat, so I would need to know this format. Again, unfortunately, Win32 APIs failed me. FindResource/FindResourceEx wouldn’t let me load the data for ICON types for direct coping (or reverse engineering) :-(. At this point, it wouldn’t be worth my time to try and get the proper format just to inline my Hyrulean Productions icon into resource scripts. I may come back to it later if I’m ever really bored.

This unfortunately brings back a lot of bad old memories regarding Win32 APIs. A lot of the Windows system is really outdated, not nearly robust enough, or just way too obfuscated, and has, and still does, cause me innumerable migraines trying to get things working with their system.

As an example, I just added the first about page to a C project, and getting fonts working on the form was not only a multi-hour long knockdown drag out due to multiple issues, I ended up having to jury rig the final solution in exasperation due to time constraints. I wanted the C about pages to match the VB ones exactly, but font size numbers just wouldn’t conform between the VB GUI designer and Windows GDI (the Windows graphics API), so I just put in arbitrary font size numbers that matched visually instead of trying to find the right conversion process, as the documented font size conversion process was not yielding proper results. This is the main reason VB (and maybe .NET) are far superior in my book when dealing with GUIs (for ease of use at least, not necessarily ability and power). I know there are libraries out that supposedly solve this problem, but I have not yet found one that I am completely happy with, which is why I had started my own fully fledged cross operating system GUI library a ways back, but it won’t be completed for a long time.

Project About Pages
Big things come in small packages
About Window Concept

I’ve been thinking for a while that I need to add “about windows” to the executables of all my applications with GUIs. So I first made a test design [left, psd file attached]

Unfortunately, this requires at least 25KB for the background alone, and this is larger than many of my project executables themselves. This is a problem for me, as I like keeping executables small and simple.

PNG Signature I therefore decided to scratch the background and just go with normal “about windows” and my signature in a spiffy font [BlizzardD]: (white background added by web browser for visibility)
The above PNG signature is only 1.66KB, so “yay”, right? Wrong :-(, it quickly occurred to me that XP does not natively support PNG.

GIF SignatureMy next though is “what about a GIF?” (GIF is the predecessor to PNG, also lossless): (1.82KB)
I remembered that GIF files worked for VB, so I thought that native Windows’ API might support it too without adding in extra DLLs, but alas, I was wrong. This at least partially solves the problem for me in Visual Basic, but not fully, as GIF does not support translucency, but only 1 color of transparency, so the picture would look horribly aliased (pixilated).

The final solution I decided on is having a small translucency-mask and alpha-blending it and the primary signature color (RGB(6,121,6)) to the “about windows’ ” background.
GIF Signature MaskSince alpha-blending/translucency is an 8 bit value, a gray-scale (also 8 bits per pixel) image is perfect for a translucency mask format for VB: (1.82KB, GIF)
You may note that this GIF is the exact same size as the previous GIF, which makes sense as it is essentially the exact same picture, just with swapped color palettes.

The final hurdle is how to import the picture into C with as little space wasted as possible. The solution to this is to create an easily decompressable alpha-mask (alpha means translucency).
BMP Signature Mask I started with the bitmap mask: (25.6KB, BMP)
From there, I figured there would be 2 easy formats for compression that would take very little code to decompress:
  • Number of Transparent Pixels, Number of Foreground Pixels in a Row, List of Foreground Pixel Masks, REPEAT... (This is a form of “Run-length encoding”)
  • Start the whole image off as transparent, and then list each group of foreground pixels with: X Start Coordinate, Y Start Coordinate, Number of Pixels in a Row, List of Foreground Pixel Masks
It also helped that there were only 16 different alpha-masks, not including the fully transparent mask, so each alpha-mask could be fit within half a byte (4 bits). I only did the first option because I’m pretty sure the second one would be larger because it would take more bits for an x/y location than for a transparent run length number.

Other variants could be used too, like counting the background as a normal mask index and just do straight run length encoding with indexes, but I knew this would make the file much larger for 2 reasons: this would add a 17th alpha-mask which would push index sizes up to 5 bits, and background run lengths are much longer (in this case 6 bits), so all runs would need to be longer (non-background runs are only 3 bits in this case). Anyways, it ended up creating a 1,652 byte file :-).

This could also very easily be edited to input/output 8-bit indexed bitmaps, or full color bitmaps even (with a max of 256 colors, or as many as you wanted with a few more code modifications). If one wanted to use this for normal pictures with a solid background instead of an alpha-mask, just know the words “Transparent” means “Background” and “Translucent” means “Non-Background” in the code.

GIF and PNG file formats actually use similar techniques, but including the code for their decoders would cause a lot more code bloat than I wanted, especially since they [theoretically] include many more compression techniques than just run-length encoding. Programming for specific cases will [theoretically] always be smaller and faster than programming for general cases. On a side note, from past research I’ve done on the JPEG format, along with programming my NES Emulator, Hynes, they [JPEG & NES] share the same main graphical compression technique [grouping colors into blocks and only recording color variations].

The following is the code to create the compressed alpha-mask stream: [Direct link to C file with all of the following code blocks]
//** Double stars denotes changes for custom circumstance [The About Window Mask]
#include <windows.h>
#include <stdio.h>
#include <conio.h>

//Our encoding functions
int ErrorOut(char* Error, FILE* HandleToClose); //If an error occurs, output
UINT Encode(UCHAR* Input, UCHAR* Output, UINT Width, UINT Height); //Encoding process
UCHAR NumBitsRequired(UINT Num); //Tests how many bits are required to represent a number
void WriteToBits(UCHAR* StartPointer, UINT BitStart, UINT Value); //Write Value to Bit# BitStart after StartPointer - Assumes more than 8 bits are never written

//Program constants
const UCHAR BytesPerPixel=3, TranspMask=255; //24 bits per pixel, and white = transparent background color

//Encoding file header
typedef struct
	USHORT DataSize; //Data size in bits - **Should be UINT
	UCHAR Width, Height; //**Should be USHORTs
	UCHAR TranspSize, TranslSize; //Largest number of bits required for a run length for Transp[arent] and Transl[ucent]
	UCHAR NumIndexes, Indexes[0]; //Number and list of indexes
} EncodedFileHeader;

int main()
	UCHAR *InputBuffer, *OutputBuffer; //Where we will hold our input and output data
	FILE *File; //Handle to current input or output file
	UINT FileSize; //Holds input and output file sizes

	//The bitmap headers tell us about its contents

	//Read in bitmap header and confirm file type
	File=fopen("AboutWindow-Mask.bmp", "rb"); //Normally you'd read in the filename from passed arguments (argv)
	if(!File) //Confirm file open
		return ErrorOut("Cannot open file for reading", NULL);
	fread(&BitmapFileHead, sizeof(BITMAPFILEHEADER), 1, File);
	if(BitmapFileHead.bfType!=*(WORD*)"BM" || BitmapFileHead.bfReserved1 || BitmapFileHead.bfReserved2) //Confirm we are opening a bitmap
		return ErrorOut("Not a bitmap", File);

	//Read in the rest of the data
	fread(&BitmapHead, sizeof(BITMAPINFOHEADER), 1, File);
	if(BitmapHead.biPlanes!=1 || BitmapHead.biBitCount!=24 || BitmapHead.biCompression!=BI_RGB) //Confirm bitmap type - this code would probably have been simpler if I did an 8 bit indexed file instead... oh well, NBD.  **It has also been programmed for easy transition to 8 bit indexed files via the "BytesPerPixel" constant.
		return ErrorOut("Bitmap must be in 24 bit RGB format", File);
	FileSize=BitmapFileHead.bfSize-sizeof(BITMAPINFOHEADER)-sizeof(BITMAPFILEHEADER); //Size of the data portion
	fread(InputBuffer, FileSize, 1, File);

	//Run Encode
	OutputBuffer=malloc(FileSize); //We should only ever need at most FileSize space for output (output should always be smaller)
	memset(OutputBuffer, 0, FileSize); //Needs to be zeroed out due to how writing of data file is non sequential
	FileSize=Encode(InputBuffer, OutputBuffer, BitmapHead.biWidth, BitmapHead.biHeight); //Encode the file and get the output size

	//Write encoded data out
	File=fopen("Output.msk", "wb");
	fwrite(OutputBuffer, FileSize, 1, File);
	printf("File %d written with %d bytes\n", 1, FileSize);

	//Free up memory and wait for user input
	getch(); //Pause for user input
	return 0;

int ErrorOut(char* Error, FILE* HandleToClose) //If an error occurs, output
	printf("%s\n", Error);
	getch(); //Pause for user input
	return 1;

UINT Encode(UCHAR* Input, UCHAR* Output, UINT Width, UINT Height) //Encoding process
	UCHAR Indexes[256], NumIndexes, IndexSize, RowPad; //The index re-mappings, number of indexes, number of bits an index takes in output data, padding at input row ends for windows bitmaps
	USHORT TranspSize, TranslSize; //Largest number of bits required for a run length for Transp[arent] (zero) and Transl[ucent] (non zero) - should be UCHAR's, but these are used as explained under "CurTranspLen" below
	UINT BitSize, x, y, ByteOn, NumPixels; //Current output size in bits, x/y coordinate counters, current byte location in Input, number of pixels in mask

	//Calculate some stuff
	NumPixels=Width*Height; //Number of pixels in mask
	RowPad=4-(Width*BytesPerPixel%4); //Account for windows DWORD row padding - see declaration comment
	RowPad=(RowPad==4 ? 0 : RowPad);

	{ //Do a first pass to find number of different mask values, run lengths, and their encoded values
		const UCHAR UnusedIndex=255; //In our index list, unused indexes are marked with this constant
		USHORT CurTranspLen, CurTranslLen; //Keep track of the lengths of the current transparent & translucent runs - TranspSize and TranslSize are temporarily used to hold the maximum run lengths
		//Zero out all index references and counters
		memset(Indexes, UnusedIndex, 256);
		//Start gathering data
		for(y=ByteOn=0;y<Height;y++) //Column
			for(x=0;x<Width;x++,ByteOn+=BytesPerPixel) //Row
				UCHAR CurMask=Input[ByteOn]; //Curent alpha mask
				if(CurMask!=TranspMask) //Translucent value?
					//Determine if index has been used yet
					if(Indexes[CurMask]==UnusedIndex) //We only need to check 1 byte per pixel as they are all the same for gray-scale **This would need to change if using non 24-bit or non gray-scale
						((EncodedFileHeader*)Output)->Indexes[NumIndexes]=CurMask; //Save mask number in the index header
						Indexes[CurMask]=NumIndexes++; //Save index number to the mask

					//Length of current transparent run
					TranspSize=(CurTranspLen>TranspSize ? CurTranspLen : TranspSize); //Max(CurTranspLen, TranspSize)

					//Length of current translucent run
				else //Transparent value?
					//Length of current translucent run
					TranslSize=(CurTranslLen>TranslSize ? CurTranslLen : TranslSize);  //Max(CurTranslLen, TranslSize)

					//Length of current transparent run

			ByteOn+=RowPad; //Account for windows DWORD row padding
		//Determine number of required bits per value
		printf("Number of Indexes: %d\nLongest Transparent Run: %d\nLongest Translucent Run: %d\n", NumIndexes,
			TranspSize=CurTranspLen>TranspSize ? CurTranspLen : TranspSize, //Max(CurTranspLen, TranspSize)
			TranslSize=CurTranslLen>TranslSize ? CurTranslLen : TranslSize  //Max(CurTranslLen, TranslSize)
		TranspSize=NumBitsRequired(TranspSize); //**This is currently overwritten a few lines down
		TranslSize=NumBitsRequired(TranslSize); //**This is currently overwritten a few lines down
		printf("Bit Lengths of - Indexes, Trasparent Run Length, Translucent Run Length: %d, %d, %d\n", IndexSize, TranspSize, TranslSize);

	//**Modify run sizes (custom) - this function could be run multiple times with different TranspSize and TranslSize until the best values are found - the best values would always be a weighted average

	//Start processing data
	BitSize=(sizeof(EncodedFileHeader)+NumIndexes)*8; //Skip the file+bitmap headers and measure in bits
		//Transparent run
		UINT CurRun=0;
		while(Input[ByteOn]==TranspMask && x<NumPixels && CurRun<(UINT)(1<<TranspSize)-1) //Final 2 checks are for EOF and capping run size to max bit length
			if(x%Width==0) //Account for windows DWORD row padding
		WriteToBits(Output, BitSize, CurRun);

		//Translucent run
		BitSize+=TranslSize; //Prepare to start writing masks first
		while(x<NumPixels && Input[ByteOn]!=TranspMask && CurRun<(UINT)(1<<TranslSize)-1) //Final 2 checks are for EOF and and capping run size to max bit length
			WriteToBits(Output, BitSize+CurRun*IndexSize, Indexes[Input[ByteOn]]);
			if(x%Width==0) //Account for windows DWORD row padding
		WriteToBits(Output, BitSize-TranslSize, CurRun); //Write the mask before the indexes
	} while(x<NumPixels);

	{ //Output header
		EncodedFileHeader *OutputHead;
		OutputHead->DataSize=BitSize-(sizeof(EncodedFileHeader)+NumIndexes)*8; //Length of file in bits not including header
	return BitSize/8+(BitSize%8 ? 1 : 0); //Return entire length of file in bytes

UCHAR NumBitsRequired(UINT Num) //Tests how many bits are required to represent a number
	UCHAR RetNum;
	_asm //Find the most significant bit
		xor eax, eax //eax=0
		bsr eax, Num //Find most significant bit in eax
		mov RetNum, al
	return RetNum+((UCHAR)(1<<RetNum)==Num ? 0 : 1); //Test if the most significant bit is the only one set, if not, at least 1 more bit is required

void WriteToBits(UCHAR* StartPointer, UINT BitStart, UINT Value) //Write Value to Bit# BitStart after StartPointer - Assumes more than 8 bits are never written

The code to decompress the alpha mask in C is as follows: (Shares some header information with above code)
void Decode(UCHAR* Input, UCHAR* Output); //Decoding process
UCHAR ReadBits(UCHAR* StartPointer, UINT BitStart, UCHAR BitSize); //Read value from Bit# BitStart after StartPointer - Assumes more than 8 bits are never read
UCHAR NumBitsRequired(UINT Num); //Tests how many bits are required to represent a number --In Encoding Code--

int main()
	//--Encoding Code--
		UCHAR *InputBuffer, *OutputBuffer; //Where we will hold our input and output data
		FILE *File; //Handle to current input or output file
		UINT FileSize; //Holds input and output file sizes
		//The bitmap headers tell us about its contents
		//Read in bitmap header and confirm file type
		//Read in the rest of the data
		//Run Encode
		//Write encoded data out
	//--END Encoding Code--

	//Run Decode
	UCHAR* O2=(BYTE*)malloc(BitmapFileHead.bfSize);
	Decode(OutputBuffer, O2);

/*	//If writing back out to a 24 bit windows bitmap, this adds the row padding back in
	File=fopen("output.bmp", "wb");
	fwrite(&BitmapFileHead, sizeof(BITMAPFILEHEADER), 1, File);
	fwrite(&BitmapHead, sizeof(BITMAPINFOHEADER), 1, File);
	fwrite(O2, BitmapFileHead.bfSize-sizeof(BITMAPINFOHEADER)-sizeof(BITMAPFILEHEADER), 1, File);*/

	//Free up memory and wait for user input --In Encoding Code--
	return 0;

void Decode(UCHAR* Input, UCHAR* Output) //Decoding process
	EncodedFileHeader H=*(EncodedFileHeader*)Input; //Save header locally so we have quick memory lookups
	UCHAR Indexes[256], IndexSize=NumBitsRequired(H.NumIndexes); //Save indexes locally so we have quick lookups, use 256 index array so we don't have to allocate memory
	UINT BitOn=0; //Bit we are currently on in reading
	memcpy(Indexes, ((EncodedFileHeader*)Input)->Indexes, 256); //Save the indexes
	Input+=(sizeof(EncodedFileHeader)+H.NumIndexes); //Start reading input past the header

	//Unroll/unencode all the pixels
		UINT i, l; //index counter, length (transparent and then index)
		//Transparent pixels
		memset(Output, TranspMask, l=ReadBits(Input, BitOn, H.TranspSize)*BytesPerPixel);

		//Translucent pixels
		l=ReadBits(Input, BitOn+=H.TranspSize, H.TranslSize);
		for(i=0;i<l;i++) //Write the gray scale out to the 3 pixels, this should technically be done in a for loop, which would unroll itself anyways, but this way ReadBits+index lookup is only done once - ** Would need to be in a for loop if not using gray-scale or 24 bit output
			Output[i*BytesPerPixel]=Output[i*BytesPerPixel+1]=Output[i*BytesPerPixel+2]=Indexes[ReadBits(Input, BitOn+i*IndexSize, IndexSize)];
	} while(BitOn<H.DataSize);

/*	{ //If writing back out to a 24 bit windows bitmap, this adds the row padding back in
		UINT i;
		UCHAR RowPad=4-(H.Width*BytesPerPixel%4); //Account for windows DWORD row padding
		RowPad=(RowPad==4 ? 0 : RowPad);
		Output-=H.Width*H.Height*BytesPerPixel; //Restore original output pointer
		for(i=H.Height;i>0;i--) //Go backwards so data doesn't overwrite itself
			memcpy(Output+(H.Width*BytesPerPixel+RowPad)*i, Output+(H.Width*BytesPerPixel)*i, H.Width*BytesPerPixel);

UCHAR ReadBits(UCHAR* StartPointer, UINT BitStart, UCHAR BitSize) //Read value from Bit# BitStart after StartPointer - Assumes more than 8 bits are never read
	return (*(WORD*)&StartPointer[BitStart/8]>>BitStart%8)&((1<<BitSize)-1);

Of course, I added some minor assembly and optimized the decoder code to get it from 335 to 266 bytes, which is only 69 bytes less :-\, but it’s something (measured using my Small project). There is no real reason to include it here, as it’s in many of my projects and the included C file for this post.

And then some test code just for kicks...
//Confirm Decoding
BOOL CheckDecode(UCHAR* Input1, UCHAR* Input2, UINT Width, UINT Height); //Confirm Decoding

//---- Put in main function above "//Free up memory and wait for user input" ----
printf(CheckDecode(InputBuffer, O2, BitmapHead.biWidth, BitmapHead.biHeight) ? "good" : "bad");

BOOL CheckDecode(UCHAR* Input1, UCHAR* Input2, UINT Width, UINT Height) //Confirm Decoding
	UINT x,y,i;
	UCHAR RowPad=4-(Width*BytesPerPixel%4); //Account for windows DWORD row padding
	RowPad=(RowPad==4 ? 0 : RowPad);

					return FALSE;
	return TRUE;

From there, it just has to be loaded into a bit array for manipulation and set back a bitmap device context, and it’s done!
VB Code: (Add the signature GIF as a picture box where it is to show up and set its “Visible” property to “false” and “Appearance” to “flat”)
'Swap in and out bits
Private Declare Function GetDIBits Lib "gdi32" (ByVal aHDC As Long, ByVal hBitmap As Long, ByVal nStartScan As Long, ByVal nNumScans As Long, lpBits As Any, lpBI As BITMAPINFOHEADER, ByVal wUsage As Long) As Long
Private Declare Function SetDIBitsToDevice Lib "gdi32" (ByVal hdc As Long, ByVal x As Long, ByVal y As Long, ByVal dx As Long, ByVal dy As Long, ByVal SrcX As Long, ByVal SrcY As Long, ByVal Scan As Long, ByVal NumScans As Long, Bits As Any, BitsInfo As BITMAPINFOHEADER, ByVal wUsage As Long) As Long
lpBits As Any, lpBitsInfo As BITMAPINFOHEADER, ByVal wUsage As Long, ByVal dwRop As Long) As Long
Private Type RGBQUAD
		b As Byte
		g As Byte
		r As Byte
		Reserved As Byte
End Type
Private Type BITMAPINFOHEADER '40 bytes
		biSize As Long
		biWidth As Long
		biHeight As Long
		biPlanes As Integer
		biBitCount As Integer
		biCompression As Long
		biSizeImage As Long
		biXPelsPerMeter As Long
		biYPelsPerMeter As Long
		biClrUsed As Long
		biClrImportant As Long
End Type
Private Const DIB_RGB_COLORS = 0 '  color table in RGBs

'Prepare colors
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As Long)
Private Declare Function GetBkColor Lib "gdi32" (ByVal hdc As Long) As Long

Public Sub DisplaySignature(ByRef TheForm As Form)
    'Read in Signature
    Dim BitmapLength As Long, OutBitmap() As RGBQUAD, BitInfo As BITMAPINFOHEADER, Signature As PictureBox
    Set Signature = TheForm.Signature
    BitmapLength = Signature.Width * Signature.Height
    ReDim OutBitmap(0 To BitmapLength - 1) As RGBQUAD
    With BitInfo
            .biSize = 40
            .biWidth = Signature.Width
            .biHeight = -Signature.Height
            .biPlanes = 1
            .biBitCount = 32
            .biCompression = 0 'BI_RGB
            .biSizeImage = .biWidth * 4 * -.biHeight
    End With
    GetDIBits Signature.hdc, Signature.Image, 0, Signature.Height, OutBitmap(0), BitInfo, DIB_RGB_COLORS
    'Alpha blend signature
    Dim i As Long, Alpha As Double, BackColor As RGBQUAD, ForeColor As RGBQUAD, OBC As Long, OFC As Long
    OFC = &H67906
    OBC = GetBkColor(TheForm.hdc)
    CopyMemory BackColor, OBC, 4
    CopyMemory ForeColor, OFC, 4
    For i = 0 To BitmapLength - 1
        Alpha = 1 - (CDbl(OutBitmap(i).r) / 255)
        OutBitmap(i).r = ForeColor.r * Alpha + BackColor.r * (1 - Alpha)
        OutBitmap(i).g = ForeColor.g * Alpha + BackColor.g * (1 - Alpha)
        OutBitmap(i).b = ForeColor.b * Alpha + BackColor.b * (1 - Alpha)
    Next i
    SetDIBitsToDevice TheForm.hdc, Signature.Left, Signature.Top, Signature.Width, Signature.Height, 0, 0, 0, Signature.Height, OutBitmap(0), BitInfo, DIB_RGB_COLORS
End Sub

C Code
//Prepare to decode signature
	//const UCHAR BytesPerPixel=4, TranspMask=255; //32 bits per pixel (for quicker copies and such - variable not used due to changing BYTE*s to DWORD*s), and white=transparent background color - also not used anymore since we directly write in the background color
	//Load data from executable
	HGLOBAL GetData=LoadResource(NULL, FindResource(NULL, "DakSig", "Sig")); //Load the resource from the executable
	BYTE *Input=(BYTE*)LockResource(GetData); //Get the resource data

	//Prepare header and decoding data
	UINT BitOn=0; //Bit we are currently on in reading
	EncodedFileHeader H=*(EncodedFileHeader*)Input; //Save header locally so we have quick memory lookups
	DWORD *Output=Signature=new DWORD[H.Width*H.Height]; //Allocate signature memory

	//Prepare the index colors
	DWORD Indexes[17], IndexSize=NumBitsRequired(H.NumIndexes); //Save full color indexes locally so we have quick lookups, use 17 index array so we don't have to allocate memory (since we already know how many there will be), #16=transparent color
	DWORD BackgroundColor=GetSysColor(COLOR_BTNFACE), FontColor=0x067906;
	BYTE *BGC=(BYTE*)&BackgroundColor, *FC=(BYTE*)&FontColor;
	for(UINT i=0;i<16;i++) //Alpha blend the indexes
		float Alpha=((EncodedFileHeader*)Input)->Indexes[i] / 255.0f;
		BYTE IndexColor[4];
		for(int n=0;n<3;n++)
			IndexColor[n]=(BYTE)(BGC[n]*Alpha + FC[n]*(1-Alpha));
		//IndexColor[3]=0; //Don't really need to worry about the last byte as it is unused
	Indexes[16]=BackgroundColor; //Translucent background = window background color

//Unroll/unencode all the pixels
Input+=(sizeof(EncodedFileHeader)+H.NumIndexes); //Start reading input past the header
	UINT l; //Length (transparent and then index)
	//Transparent pixels
	memsetd(Output, Indexes[16], l=ReadBits(Input, BitOn, H.TranspSize));

	//Translucent pixels
	l=ReadBits(Input, BitOn+=H.TranspSize, H.TranslSize);
	for(i=0;i<l;i++) //Write the gray scale out to the 3 pixels, this should technically be done in a for loop, which would unroll itself anyways, but this way ReadBits+index lookup is only done once - ** Would need to be in a for loop if not using gray-scale or 24 bit output
		Output[i]=Indexes[ReadBits(Input, BitOn+i*IndexSize, IndexSize)];
} while(BitOn<H.DataSize);

//Output the signature
const BITMAPINFOHEADER MyBitmapInfo={sizeof(BITMAPINFOHEADER), 207, 42, 1, 32, BI_RGB, 0, 0, 0, 0, 0};
SetDIBitsToDevice(MyDC, x, y, MyBitmapInfo.biWidth, MyBitmapInfo.biHeight, 0, 0, 0, MyBitmapInfo.biHeight, Signature, (BITMAPINFO*)&MyBitmapInfo, DIB_RGB_COLORS);

This all adds ~3.5KB to each VB project, and ~2KB to each C/CPP project. Some other recent additions to all project executables include the Hyrulean Productions icon (~1KB) and file version information (1-2KB). I know that a few KB doesn’t seem like much, but when executables are often around 10KB, it can almost double their size.

While I’m on the topic of project sizes, I should note that I always compress their executables with UPX, a very nifty executable compressor. It would often be more prudent to use my Small project, but I don’t want to complicate my open-source code.

One other possible solution I did not pursue would be to take the original font and create a subset font of it with only the letters (and font size?) I need, and see if that file is smaller. I doubt it would have worked well though.