Home Page
Archive > Posts > Tags > Formats
Search:

Pulling HTML from Github markdown for external use
Although, converting to markdown is a time consuming pain

So I started getting on the Github bandwagon FINALLY. I figured that while I was going to the trouble of remaking readme files for the projects into github markdown files, I might as well duplicate the compiled HTML for my website.

The below code is a simple PHP script to pull in the converted HTML from Github’s API and then do some more modifications to facilitate directly inserting it into a website.


Usage:
  • The variables that can be updated are all at the top of the file.
  • The script will always output the finished result to the user’s browser, but can also optionally save it to an external file by setting the $SaveFileName variable.
  • Stylesheet:
    • The script automatically includes a specified stylesheet from the $StylesheetLocation variable.
    • The stylesheet I used is from https://gist.github.com/somebox/1082608. I’m not too happy with its coloring scheme, but it’ll do for now.
    • The required modifications that need to be made to the css are to change “body” to “.GHMarkdown”, and then add “.GHMarkdown” before all other rules.
    • This is the one I am currently using for my website, but it also has a few modifications made specifically for my layouts.
  • Modifications
    • In my markdowns, I like to link to internal sections by first creating a bookmark as “<div name="BOOKMARK_NAME">...</div>” and then linking via “[LinkName](#BOOKMARK_NAME)”. While this works on github, the bookmark’s names are actually changed to something like “user-content-BOOKMARK-NAME”, which is not useable outside of github. The first $RegexModifications item therefore updates the bookmarks back to their original name, and turns them into <span>s (which github does not support).
    • The second rule just removes the “aria-hidden” attributes, which my W3C checking scripts throw a warning on.
  • Note that sometimes, the script may return an error of “transfer closed with XXX bytes remaining to read”. This means that github denied the request (probably due to too many requests in too short a timespan), but the input is too large so github prematurely terminated the connection. If this happens, try sending a tiny input and see if you get back a proper error.

<?php
//Variables
$SaveFileName='Output.html'; //Optionally save output to a file. Comment out to not save
$InputFile='Input.md';
$StylesheetLocation='github-markdown.css';
$RegexModifications=Array(
        '/<div name="user-content-(.*?)"(.*?)<\/div>/s'=>'<span id="$1"$2</span>', //Change <div name="user-contentXXX ---TO--- <span name="XXX
        '/ ?aria-hidden="true"/'=>'' //Remove aria-hidden attribute
);

//Set the curl options
$CurlHandle=curl_init(); //Init curl
curl_setopt_array($CurlHandle, Array(
        CURLOPT_URL=>           'https://api.github.com/markdown/raw', //Markdown/raw takes and returns plain text input and output
        CURLOPT_FAILONERROR=>   false,
        CURLOPT_FOLLOWLOCATION=>1,
        CURLOPT_RETURNTRANSFER=>1, //Return result as a string
        CURLOPT_TIMEOUT=>       300,
        CURLOPT_POST=>          1,
        CURLOPT_POSTFIELDS=>    file_get_contents($InputFile), //Pull in the requested file
        CURLOPT_HTTPHEADER=>    Array('Content-type: text/plain'), //Github expects the given data to be plaintext
        CURLOPT_SSL_VERIFYPEER=>0, //In case there are problems with the PHP ssl chain (often the case in Windows), ignore the error
        CURLOPT_USERAGENT=>     'Curl/PHP' //Github now requires a useragent to process the request
));

//Pull in the html converted markdown from Github
$Return=curl_exec($CurlHandle);
if(curl_errno($CurlHandle)) //Check for error
        $Return=curl_error($CurlHandle);
curl_close($CurlHandle);

//Make regex modifications
$Return=preg_replace(array_keys($RegexModifications), array_values($RegexModifications), $Return);

//Generate the final HTML. It will also be output here if not saving to a file
header('Content-Type: text/html; charset=utf-8');
if(isset($SaveFileName)) //If saving to a file, buffer output
        ob_start();
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Markdown pull</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="<?=$StylesheetLocation?>" rel=stylesheet type="text/css">
</head><body><div class=GHMarkdown>
<?=$Return?>
</div></body></html>
<?php
//Save to a file if requested
if(isset($SaveFileName))
        file_put_contents($SaveFileName, ob_get_flush()); //Actual output happens here too when saving to a file
?>
Format Text [to HTML] Script

After writing the documentation in plaintext format for DSQL just now, I needed to convert it into HTML for the project’s page. I’ve done this before manually and it’s always very daunting, so I decided to really quickly write a script to do most of the work for me, which can be downloaded here, or the code seen below.

It has the following:
  • Input text box with HTML data that is instantly shown as HTML in a below section when modified.
    Both sections take up half the vertical screen space
  • Undo/redo buffer for the text box (very primitive functionality)
  • “Open in new page” button, which opens a new window with just the HTML data (useful for validation [W3C or whatnot]).
    This is disabled by default because it is a dangerous option (XSS exploitable, so the script would need to be secured/password protected if this was on)
  • “Escape HTML” escapes HTML characters so they are not improperly interpreted (e.x. “<” becomes “&lt;”)
  • Listize:
    • Turns tabbed lists into HTML
    • For example:
      1
      	2
      	3
      		4
      5
      would become:
      1
      • 2
      • 3
        • 4
      5


I realized while making the script that I should probably instead just start making my documentation in a markup (like GitHub’s) and then have that converted to HTML and text files. Oh well.



Code:
<? header('Content-Type: text/html; charset=utf-8'); ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Format Text</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<?
$AllowRenderText=true; //Set to true only if this is in a secure environment, as directly outputting a given value can lead to XSS
if(isset($_REQUEST['RenderText']))
	return print '</head><body>'.($AllowRenderText ? $_REQUEST['RenderText'] : 'Rendering of text not allowed').'</body></html>';
?>
<style type="text/css">
html, body { width:100%; height:100%; margin:0; padding:0; }
.HalfScreen { display:block; width:calc(100% - 2px); height:calc(50% - 2px - 30px/2); margin:0; border:1px solid black; }
#RenderForm { overflow:hidden; }
#RenderText { margin:0; border:0; width:100%; height:100%; }
#RenderHTML { overflow-x:hidden; overflow-y:scroll; }
.TopBar { height:30px; background-color:grey; }
.Hide { position:absolute; visibility:hidden; top:-10000px; }
</style>

<script type="text/javascript" src="https://code.jquery.com/jquery-2.1.1.min.js"></script>
<script type="text/javascript">$(document).ready(function() {

//History for undoing
var UndoBuf=[], RedoBuf=[];
function Undo()
{
	if(!UndoBuf.length)
		return;
	RedoBuf.push(UndoBuf.pop());
	$('#RenderText').val(UndoBuf[UndoBuf.length-1]);
	$('#RenderHTML').html(UndoBuf[UndoBuf.length-1]);
}
function Redo()
{
	if(!RedoBuf.length)
		return;
	$('#RenderText').val(RedoBuf[RedoBuf.length-1]);
	$('#RenderHTML').html(RedoBuf[RedoBuf.length-1]);
	UndoBuf.push(RedoBuf.pop());
}
$('#Undo').click(function(e) { e.preventDefault(); Undo(); });
$('#Redo').click(function(e) { e.preventDefault(); Redo(); });

//Render HTML
function Render() {
	//Do the render
	var MyVal=$('#RenderText').val();
	$('#RenderHTML').html(MyVal);

	//Save current value to the history
	//*Better history functionality here would be real nice (using smart currentTarget.selectionStart/End calculations), along with an undo/redo button, but not within the scope of this project
	if(RedoBuf.length) //Empty redo buffer
		RedoBuf=[];
	UndoBuf.push(MyVal);
	if(UndoBuf.length>100) //Limit history buffer
		UndoBuf.shift();
}
$('#RenderText').on('keypress paste', function(e) { setTimeout(Render, 1); }); //Automatic update on paste requires a timeout

//Open in new page
$('#OpenInNewPage').click(function(e) {
	e.preventDefault();
	$('#RenderForm').submit();
});

//Escape HTML
$('#EscapeHTML').click(function(e) {
	e.preventDefault();
	$('#RenderText').val(function(index, value) {
		$.each({"&amp;":/&/g, "&lt;":/</g, "&gt;":/>/g, "&quot;":/"/g, "&#039;":/'/g}, function(HTMLStr, ReplStr) {
			value=value.replace(ReplStr, HTMLStr); });
		return value;
	});
	Render();
});

//Listize based on tabbing
//If a successive line is tabbed over beyond the current, it is made inside a new nested list.
//Tabbing over more than once on a successive line will create multiple nests
//Having @@@ at the beginning of a line will include it in the previous line item, no matter the tabbing
//Make sure to have @@@ blank lines tabbed over to the proper nested level
$('#Listize').click(function(e) {
	//Get the text to replace
	e.preventDefault();
	var T=$('#RenderText').val();

	//Go over each line and if the next line is tabbed beyond it, make it a new nested list. Blank
	var CurTabLevel=0, NewLines=[]; //NewLines is 2 items per line: the original string and the new html tags
	$.each(T.split(/\r?\n/), function(Index, Str) {
		//Check for a continued line item
		if(Str.substr(0, 3)=='@@@')
			return NewLines.push('<br>', Str.substr(3));

		//In/de-dent as needed
		var Tags='';
		var NewTabLevel=/^\t*/.exec(Str)[0].length, PreLevel=CurTabLevel; //Get the nested level
		for(;NewTabLevel>CurTabLevel;CurTabLevel++)
			Tags+='<ul><li>';
		for(;NewTabLevel<CurTabLevel;CurTabLevel--)
			Tags+='</li></ul>';

		//Fill out the rest of the line
		if(NewTabLevel==0) //Breaks between top level new lines
			Tags+=(Index && PreLevel==0 ? '<br>' : '');
		else if(PreLevel>=NewTabLevel) //If previous item needs to be ended (new level is not greater and not 0)
			Tags+='</li><li>';

		NewLines.push(Tags, Str);
	});

	//Finish de-dent as needed
	var Final=[NewLines.shift()];
	var EndLine='';
	while(CurTabLevel--)
		EndLine+='</li></ul>';
	NewLines.push(EndLine);

	//Combine each line with the tags
	for(var i=0;i<NewLines.length;i+=2)
		Final.push(NewLines[i+0]+NewLines[i+1]);



	//Update from the replaced text
	$('#RenderText').val(Final.join("\n"));
	Render();
});

});</script>

</head>
<body>
	<div class=TopBar>
		<input type=button id=EscapeHTML value="Escape HTML">
		<input type=button id=Listize value="Listize">
		<? if($AllowRenderText) { ?> <input type=button id=OpenInNewPage value="Open In New Page"> <? } ?>
		<input type=button id=Undo value="Undo">
		<input type=button id=Redo value="Redo">
	</div>
	<form action="FormatText.php" method=post id=RenderForm target="_blank" class=HalfScreen>
		<textarea id=RenderText name=RenderText></textarea>
		<input type=submit class=Hide>
	</form>
	<div id=RenderHTML class=HalfScreen></div>
</body>
</html>
Setting the time zone through a numeric offset
They never make it easy

I had the need today to be able to set the current time zone for an application in multiple computer languages by the hourly offset from GMT/UTC, which turned out to be a lot harder than I expected. It seems most time zone related functions, at least in Linux, expect you to use full location strings to set the current time zone offset (i.e. America/Chicago).


After a lot of research and experimenting, I came up with the following results. All of these are confirmed working in Linux, and most or all of them should work in Windows too.

Language Format Note Format for GMT+5 Format for GMT-5
C Negate GMT-5 GMT5
Perl Negate GMT-5 GMT5
SQL Requires Sign +5:00 -5:00
PHP Negate, Requires Sign Etc/GMT-5 Etc/GMT+5

And here are examples of using this in each language. The “TimeZone” string variable should be a 1-2 digit integer with an optional preceding negative sign:
Language Example
C
#include <stdio.h> //snprintf
#include <stdlib.h> //setenv, atoi
#include <time.h> //tzset

...

char Buffer[10];
snprintf(Buffer, 10, "GMT%i", -atoi(TimeZone));
setenv("TZ", Buffer, 1);
tzset();
		
Perl
use POSIX qw/tzset/;
$ENV{TZ}='GMT'.(-$TimeZone);
tzset;
		
SQL [Query string created via Perl]
$Query='SET time_zone="'.($TimeZone>=0 ? '+' : '').$TimeZone.':00"';
		
PHP
date_default_timezone_set('Etc/GMT'.($TimeZone<=0 ? '+' : '').(-$TimeZone));
		
OpenSSH RSA Authentication public key file format
Curiosity as always

There are two primary authentication methods for logging onto an SSH server as a user. The first is password based authentication, and the second is public key authentication. The public/private RSA key pair for public key authentication can be created using OpenSSH’s “ssh-keygen” application.

I’m not going to go into the exact method on accomplishing this because instructions can be found on countless other places on the internet. However, I was curious yesterday as to what exactly was in the public key (.pub) files created by ssh-keygen, as the data payload was larger than I expected (2232 bits for a 2048 bit key). I couldn’t find documentation on this ANYWHERE on the internet, so I downloaded the OpenSSH source code and looked at the generation code of the files. The format of the files is as follows:

  • The public key files are ASCII based text files with each public key taking up exactly one line.
  • Each line is formatted with 2 pieces of data as follows:
    KEY_TYPE DATA_PAYLOAD
  • KEY_TYPE is the type of public key, which in our case (and most cases nowadays) is “ssh-rsa”.
  • DATA_PAYLOAD contains the actual public key information encoded in base64 with the following format:
TypeByte lengthNameDescriptionDefault Value
unsigned int4KEY_TYPE_LENGTHLength of the next entry7
StringSee previousKEY_TYPESee abovessh-rsa
unsigned int4E_LENGTHLength of the next entry3
BigIntSee previousethis is the public key exponent in RSA65537
unsigned int4N_LENGTHLength of the next entryKEY_BIT_SIZE/8 (optional +1)
BigIntSee previousnthis is the “modulus for both the public and private keys” in RSAKey dependent

I also checked putty public key authentication files and they seemed to contain the exact same DATA_PAYLOAD.

Inlining Executable Resources
Do you suffer from OPC (Obsessive Perfection Complex)? If not, you aren’t an engineer :-)

I am somewhat obsessive about file cleanliness, and like to have everything I do well organized with any superfluous files removed. This especially translates into my source code, and even more so for released source code.

Before I zip up the source code for any project, I always remove the extraneous workspace compilation files. These usually include:

  • C/C++: Debug & Release directories, *.ncb, *.plg, *.opt, and *.aps
  • VB: *.vbw
  • .NET: *.suo, *.vbproj.user

Unfortunately, a new offender surfaced in the form of the Hyrulean Productions icon and Signature File for about pages. I did not want to have to have every source release include those 2 extra files, so I did research into inlining them in the resource script (.rc) file. Resources are just data directly compiled into an executable, and the resource script tells the executable all of these resources and how to compile them in. All my C projects include a resource script for at least the file version, author information, and Hyrulean Productions icon. Anyways, this turned out to be way more of a pain in the butt than intended.


There are 2 ways to load “raw data” (not a standard format like an icon, bitmap, string table, version information, etc) into a resource script. The first way is through loading an external file:
RESOURCEID RESOURCETYPE DISCARDABLE "ResourceFileName"
for example:
DAKSIG	SIG	DISCARDABLE	"Dakusan.sig"
RESOURCEID and RESOURCETYPE are arbitrary and user defined, and it should also be noted to usually have them in caps, as the compilers seem to often be picky about case.

The second way is through inlining the data:
RESOURCEID	RESOURCETYPE
BEGIN
	DATA
END
for example:
DakSig	Sig
BEGIN
	0x32DA,0x2ACF,0x0306,...
END
Getting the data in the right format for the resource script is a relatively simple task.
  • First, acquire the data in 16-bit encoded format (HEX). I suggest WinHex for this job.
    On a side note, I have been using WinHex for ages and highly recommend it. It’s one of the most well built and fully featured application suites I know if.
  • Lastly, convert the straight HEX DATA (“DA32CF2A0603...”) into an array of proper endian hex values (“0x32DA,0x2ACF,0x0306...”). This can be done with a global replace regular expression of “(..)(..)” to “0x$2$1,”. I recommend Editpad Pro for this kind of work, another of my favorite pieces of software. As a matter of fact, I am writing this post right now in it :-).

Here is where the real caveats and problems start falling into place. First, I noticed the resource data was corrupt for a few bytes at a certain location. It turned out to be Visual Studio wanting line lengths in the resource file to be less than ~4175 characters, so I just added a line break at that point.

This idea worked great for the about page signature, which needed to be raw data anyways, but encoding the icon this way turned out to be impossible :-(. Visual Studio apparently requires external files be loaded if you want to use a pre-defined binary resource type (ICON, BITMAP, etc). The simple solution would be to inline the icon as a user defined raw data type, but unfortunately, the Win32 icon loading API functions (LoadIcon, CreateIconFromResource, LoadImage, etc) only seemed to work with properly defined ICONs. I believe the problem here is that when the compiler loads in the icon to include in the executable, it reformats it somewhat, so I would need to know this format. Again, unfortunately, Win32 APIs failed me. FindResource/FindResourceEx wouldn’t let me load the data for ICON types for direct coping (or reverse engineering) :-(. At this point, it wouldn’t be worth my time to try and get the proper format just to inline my Hyrulean Productions icon into resource scripts. I may come back to it later if I’m ever really bored.


This unfortunately brings back a lot of bad old memories regarding Win32 APIs. A lot of the Windows system is really outdated, not nearly robust enough, or just way too obfuscated, and has, and still does, cause me innumerable migraines trying to get things working with their system.

As an example, I just added the first about page to a C project, and getting fonts working on the form was not only a multi-hour long knockdown drag out due to multiple issues, I ended up having to jury rig the final solution in exasperation due to time constraints. I wanted the C about pages to match the VB ones exactly, but font size numbers just wouldn’t conform between the VB GUI designer and Windows GDI (the Windows graphics API), so I just put in arbitrary font size numbers that matched visually instead of trying to find the right conversion process, as the documented font size conversion process was not yielding proper results. This is the main reason VB (and maybe .NET) are far superior in my book when dealing with GUIs (for ease of use at least, not necessarily ability and power). I know there are libraries out that supposedly solve this problem, but I have not yet found one that I am completely happy with, which is why I had started my own fully fledged cross operating system GUI library a ways back, but it won’t be completed for a long time.

Project About Pages
Big things come in small packages
About Window Concept

I’ve been thinking for a while that I need to add “about windows” to the executables of all my applications with GUIs. So I first made a test design [left, psd file attached]

Unfortunately, this requires at least 25KB for the background alone, and this is larger than many of my project executables themselves. This is a problem for me, as I like keeping executables small and simple.

PNG Signature I therefore decided to scratch the background and just go with normal “about windows” and my signature in a spiffy font [BlizzardD]: (white background added by web browser for visibility)
The above PNG signature is only 1.66KB, so “yay”, right? Wrong :-(, it quickly occurred to me that XP does not natively support PNG.

GIF SignatureMy next though is “what about a GIF?” (GIF is the predecessor to PNG, also lossless): (1.82KB)
I remembered that GIF files worked for VB, so I thought that native Windows’ API might support it too without adding in extra DLLs, but alas, I was wrong. This at least partially solves the problem for me in Visual Basic, but not fully, as GIF does not support translucency, but only 1 color of transparency, so the picture would look horribly aliased (pixilated).

The final solution I decided on is having a small translucency-mask and alpha-blending it and the primary signature color (RGB(6,121,6)) to the “about windows’ ” background.
GIF Signature MaskSince alpha-blending/translucency is an 8 bit value, a gray-scale (also 8 bits per pixel) image is perfect for a translucency mask format for VB: (1.82KB, GIF)
You may note that this GIF is the exact same size as the previous GIF, which makes sense as it is essentially the exact same picture, just with swapped color palettes.

The final hurdle is how to import the picture into C with as little space wasted as possible. The solution to this is to create an easily decompressable alpha-mask (alpha means translucency).
BMP Signature Mask I started with the bitmap mask: (25.6KB, BMP)
From there, I figured there would be 2 easy formats for compression that would take very little code to decompress:
  • Number of Transparent Pixels, Number of Foreground Pixels in a Row, List of Foreground Pixel Masks, REPEAT... (This is a form of “Run-length encoding”)
  • Start the whole image off as transparent, and then list each group of foreground pixels with: X Start Coordinate, Y Start Coordinate, Number of Pixels in a Row, List of Foreground Pixel Masks
It also helped that there were only 16 different alpha-masks, not including the fully transparent mask, so each alpha-mask could be fit within half a byte (4 bits). I only did the first option because I’m pretty sure the second one would be larger because it would take more bits for an x/y location than for a transparent run length number.

Other variants could be used too, like counting the background as a normal mask index and just do straight run length encoding with indexes, but I knew this would make the file much larger for 2 reasons: this would add a 17th alpha-mask which would push index sizes up to 5 bits, and background run lengths are much longer (in this case 6 bits), so all runs would need to be longer (non-background runs are only 3 bits in this case). Anyways, it ended up creating a 1,652 byte file :-).


This could also very easily be edited to input/output 8-bit indexed bitmaps, or full color bitmaps even (with a max of 256 colors, or as many as you wanted with a few more code modifications). If one wanted to use this for normal pictures with a solid background instead of an alpha-mask, just know the words “Transparent” means “Background” and “Translucent” means “Non-Background” in the code.

GIF and PNG file formats actually use similar techniques, but including the code for their decoders would cause a lot more code bloat than I wanted, especially since they [theoretically] include many more compression techniques than just run-length encoding. Programming for specific cases will [theoretically] always be smaller and faster than programming for general cases. On a side note, from past research I’ve done on the JPEG format, along with programming my NES Emulator, Hynes, they [JPEG & NES] share the same main graphical compression technique [grouping colors into blocks and only recording color variations].


The following is the code to create the compressed alpha-mask stream: [Direct link to C file with all of the following code blocks]
//** Double stars denotes changes for custom circumstance [The About Window Mask]
#include <windows.h>
#include <stdio.h>
#include <conio.h>

//Our encoding functions
int ErrorOut(char* Error, FILE* HandleToClose); //If an error occurs, output
UINT Encode(UCHAR* Input, UCHAR* Output, UINT Width, UINT Height); //Encoding process
UCHAR NumBitsRequired(UINT Num); //Tests how many bits are required to represent a number
void WriteToBits(UCHAR* StartPointer, UINT BitStart, UINT Value); //Write Value to Bit# BitStart after StartPointer - Assumes more than 8 bits are never written

//Program constants
const UCHAR BytesPerPixel=3, TranspMask=255; //24 bits per pixel, and white = transparent background color

//Encoding file header
typedef struct
{
	USHORT DataSize; //Data size in bits - **Should be UINT
	UCHAR Width, Height; //**Should be USHORTs
	UCHAR TranspSize, TranslSize; //Largest number of bits required for a run length for Transp[arent] and Transl[ucent]
	UCHAR NumIndexes, Indexes[0]; //Number and list of indexes
} EncodedFileHeader;

int main()
{
	UCHAR *InputBuffer, *OutputBuffer; //Where we will hold our input and output data
	FILE *File; //Handle to current input or output file
	UINT FileSize; //Holds input and output file sizes

	//The bitmap headers tell us about its contents
	BITMAPFILEHEADER BitmapFileHead;
	BITMAPINFOHEADER BitmapHead;

	//Read in bitmap header and confirm file type
	File=fopen("AboutWindow-Mask.bmp", "rb"); //Normally you'd read in the filename from passed arguments (argv)
	if(!File) //Confirm file open
		return ErrorOut("Cannot open file for reading", NULL);
	fread(&BitmapFileHead, sizeof(BITMAPFILEHEADER), 1, File);
	if(BitmapFileHead.bfType!=*(WORD*)"BM" || BitmapFileHead.bfReserved1 || BitmapFileHead.bfReserved2) //Confirm we are opening a bitmap
		return ErrorOut("Not a bitmap", File);

	//Read in the rest of the data
	fread(&BitmapHead, sizeof(BITMAPINFOHEADER), 1, File);
	if(BitmapHead.biPlanes!=1 || BitmapHead.biBitCount!=24 || BitmapHead.biCompression!=BI_RGB) //Confirm bitmap type - this code would probably have been simpler if I did an 8 bit indexed file instead... oh well, NBD.  **It has also been programmed for easy transition to 8 bit indexed files via the "BytesPerPixel" constant.
		return ErrorOut("Bitmap must be in 24 bit RGB format", File);
	FileSize=BitmapFileHead.bfSize-sizeof(BITMAPINFOHEADER)-sizeof(BITMAPFILEHEADER); //Size of the data portion
	InputBuffer=malloc(FileSize);
	fread(InputBuffer, FileSize, 1, File);
	fclose(File);

	//Run Encode
	OutputBuffer=malloc(FileSize); //We should only ever need at most FileSize space for output (output should always be smaller)
	memset(OutputBuffer, 0, FileSize); //Needs to be zeroed out due to how writing of data file is non sequential
	FileSize=Encode(InputBuffer, OutputBuffer, BitmapHead.biWidth, BitmapHead.biHeight); //Encode the file and get the output size

	//Write encoded data out
	File=fopen("Output.msk", "wb");
	fwrite(OutputBuffer, FileSize, 1, File);
	fclose(File);
	printf("File %d written with %d bytes\n", 1, FileSize);

	//Free up memory and wait for user input
	free(InputBuffer);
	free(OutputBuffer);
	getch(); //Pause for user input
	return 0;
}

int ErrorOut(char* Error, FILE* HandleToClose) //If an error occurs, output
{
	if(HandleToClose)
		fclose(HandleToClose);
	printf("%s\n", Error);
	getch(); //Pause for user input
	return 1;
}

UINT Encode(UCHAR* Input, UCHAR* Output, UINT Width, UINT Height) //Encoding process
{
	UCHAR Indexes[256], NumIndexes, IndexSize, RowPad; //The index re-mappings, number of indexes, number of bits an index takes in output data, padding at input row ends for windows bitmaps
	USHORT TranspSize, TranslSize; //Largest number of bits required for a run length for Transp[arent] (zero) and Transl[ucent] (non zero) - should be UCHAR's, but these are used as explained under "CurTranspLen" below
	UINT BitSize, x, y, ByteOn, NumPixels; //Current output size in bits, x/y coordinate counters, current byte location in Input, number of pixels in mask

	//Calculate some stuff
	NumPixels=Width*Height; //Number of pixels in mask
	RowPad=4-(Width*BytesPerPixel%4); //Account for windows DWORD row padding - see declaration comment
	RowPad=(RowPad==4 ? 0 : RowPad);

	{ //Do a first pass to find number of different mask values, run lengths, and their encoded values
		const UCHAR UnusedIndex=255; //In our index list, unused indexes are marked with this constant
		USHORT CurTranspLen, CurTranslLen; //Keep track of the lengths of the current transparent & translucent runs - TranspSize and TranslSize are temporarily used to hold the maximum run lengths
		//Zero out all index references and counters
		memset(Indexes, UnusedIndex, 256);
		NumIndexes=0;
		TranspSize=TranslSize=CurTranspLen=CurTranslLen=0;
		//Start gathering data
		for(y=ByteOn=0;y<Height;y++) //Column
		{
			for(x=0;x<Width;x++,ByteOn+=BytesPerPixel) //Row
			{
				UCHAR CurMask=Input[ByteOn]; //Curent alpha mask
				if(CurMask!=TranspMask) //Translucent value?
				{
					//Determine if index has been used yet
					if(Indexes[CurMask]==UnusedIndex) //We only need to check 1 byte per pixel as they are all the same for gray-scale **This would need to change if using non 24-bit or non gray-scale
					{
						((EncodedFileHeader*)Output)->Indexes[NumIndexes]=CurMask; //Save mask number in the index header
						Indexes[CurMask]=NumIndexes++; //Save index number to the mask
					}

					//Length of current transparent run
					TranspSize=(CurTranspLen>TranspSize ? CurTranspLen : TranspSize); //Max(CurTranspLen, TranspSize)
					CurTranspLen=0;

					//Length of current translucent run
					CurTranslLen++;
				}
				else //Transparent value?
				{
					//Length of current translucent run
					TranslSize=(CurTranslLen>TranslSize ? CurTranslLen : TranslSize);  //Max(CurTranslLen, TranslSize)
					CurTranslLen=0;

					//Length of current transparent run
					CurTranspLen++;
				}
			}

			ByteOn+=RowPad; //Account for windows DWORD row padding
		}
		//Determine number of required bits per value
		printf("Number of Indexes: %d\nLongest Transparent Run: %d\nLongest Translucent Run: %d\n", NumIndexes,
			TranspSize=CurTranspLen>TranspSize ? CurTranspLen : TranspSize, //Max(CurTranspLen, TranspSize)
			TranslSize=CurTranslLen>TranslSize ? CurTranslLen : TranslSize  //Max(CurTranslLen, TranslSize)
			);
		IndexSize=NumBitsRequired(NumIndexes);
		TranspSize=NumBitsRequired(TranspSize); //**This is currently overwritten a few lines down
		TranslSize=NumBitsRequired(TranslSize); //**This is currently overwritten a few lines down
		printf("Bit Lengths of - Indexes, Trasparent Run Length, Translucent Run Length: %d, %d, %d\n", IndexSize, TranspSize, TranslSize);
	}

	//**Modify run sizes (custom) - this function could be run multiple times with different TranspSize and TranslSize until the best values are found - the best values would always be a weighted average
	TranspSize=6;
	TranslSize=3;

	//Start processing data
	BitSize=(sizeof(EncodedFileHeader)+NumIndexes)*8; //Skip the file+bitmap headers and measure in bits
	x=ByteOn=0;
	do
	{
		//Transparent run
		UINT CurRun=0;
		while(Input[ByteOn]==TranspMask && x<NumPixels && CurRun<(UINT)(1<<TranspSize)-1) //Final 2 checks are for EOF and capping run size to max bit length
		{
			x++;
			CurRun++;
			ByteOn+=BytesPerPixel;
			if(x%Width==0) //Account for windows DWORD row padding
				ByteOn+=RowPad;
		}
		WriteToBits(Output, BitSize, CurRun);
		BitSize+=TranspSize;

		//Translucent run
		CurRun=0;
		BitSize+=TranslSize; //Prepare to start writing masks first
		while(x<NumPixels && Input[ByteOn]!=TranspMask && CurRun<(UINT)(1<<TranslSize)-1) //Final 2 checks are for EOF and and capping run size to max bit length
		{
			WriteToBits(Output, BitSize+CurRun*IndexSize, Indexes[Input[ByteOn]]);
			x++;
			CurRun++;
			ByteOn+=BytesPerPixel;
			if(x%Width==0) //Account for windows DWORD row padding
				ByteOn+=RowPad;
		}
		WriteToBits(Output, BitSize-TranslSize, CurRun); //Write the mask before the indexes
		BitSize+=CurRun*IndexSize;
	} while(x<NumPixels);

	{ //Output header
		EncodedFileHeader *OutputHead;
		OutputHead=(EncodedFileHeader*)Output;
		OutputHead->DataSize=BitSize-(sizeof(EncodedFileHeader)+NumIndexes)*8; //Length of file in bits not including header
		OutputHead->Width=Width;
		OutputHead->Height=Height;
		OutputHead->TranspSize=(UCHAR)TranspSize;
		OutputHead->TranslSize=(UCHAR)TranslSize;
		OutputHead->NumIndexes=NumIndexes;
	}
	return BitSize/8+(BitSize%8 ? 1 : 0); //Return entire length of file in bytes
}

UCHAR NumBitsRequired(UINT Num) //Tests how many bits are required to represent a number
{
	UCHAR RetNum;
	_asm //Find the most significant bit
	{
		xor eax, eax //eax=0
		bsr eax, Num //Find most significant bit in eax
		mov RetNum, al
	}
	return RetNum+((UCHAR)(1<<RetNum)==Num ? 0 : 1); //Test if the most significant bit is the only one set, if not, at least 1 more bit is required
}

void WriteToBits(UCHAR* StartPointer, UINT BitStart, UINT Value) //Write Value to Bit# BitStart after StartPointer - Assumes more than 8 bits are never written
{
	*(WORD*)(&StartPointer[BitStart/8])|=Value<<(BitStart%8);
}

The code to decompress the alpha mask in C is as follows: (Shares some header information with above code)
//Decode
void Decode(UCHAR* Input, UCHAR* Output); //Decoding process
UCHAR ReadBits(UCHAR* StartPointer, UINT BitStart, UCHAR BitSize); //Read value from Bit# BitStart after StartPointer - Assumes more than 8 bits are never read
UCHAR NumBitsRequired(UINT Num); //Tests how many bits are required to represent a number --In Encoding Code--

int main()
{
	//--Encoding Code--
		UCHAR *InputBuffer, *OutputBuffer; //Where we will hold our input and output data
		FILE *File; //Handle to current input or output file
		UINT FileSize; //Holds input and output file sizes
	
		//The bitmap headers tell us about its contents
		//Read in bitmap header and confirm file type
		//Read in the rest of the data
		//Run Encode
		//Write encoded data out
	//--END Encoding Code--

	//Run Decode
	UCHAR* O2=(BYTE*)malloc(BitmapFileHead.bfSize);
	Decode(OutputBuffer, O2);

/*	//If writing back out to a 24 bit windows bitmap, this adds the row padding back in
	File=fopen("output.bmp", "wb");
	fwrite(&BitmapFileHead, sizeof(BITMAPFILEHEADER), 1, File);
	fwrite(&BitmapHead, sizeof(BITMAPINFOHEADER), 1, File);
	fwrite(O2, BitmapFileHead.bfSize-sizeof(BITMAPINFOHEADER)-sizeof(BITMAPFILEHEADER), 1, File);*/

	//Free up memory and wait for user input --In Encoding Code--
	return 0;
}

//Decoding
void Decode(UCHAR* Input, UCHAR* Output) //Decoding process
{
	EncodedFileHeader H=*(EncodedFileHeader*)Input; //Save header locally so we have quick memory lookups
	UCHAR Indexes[256], IndexSize=NumBitsRequired(H.NumIndexes); //Save indexes locally so we have quick lookups, use 256 index array so we don't have to allocate memory
	UINT BitOn=0; //Bit we are currently on in reading
	memcpy(Indexes, ((EncodedFileHeader*)Input)->Indexes, 256); //Save the indexes
	Input+=(sizeof(EncodedFileHeader)+H.NumIndexes); //Start reading input past the header

	//Unroll/unencode all the pixels
	do
	{
		UINT i, l; //index counter, length (transparent and then index)
		//Transparent pixels
		memset(Output, TranspMask, l=ReadBits(Input, BitOn, H.TranspSize)*BytesPerPixel);
		Output+=l;

		//Translucent pixels
		l=ReadBits(Input, BitOn+=H.TranspSize, H.TranslSize);
		BitOn+=H.TranslSize;
		for(i=0;i<l;i++) //Write the gray scale out to the 3 pixels, this should technically be done in a for loop, which would unroll itself anyways, but this way ReadBits+index lookup is only done once - ** Would need to be in a for loop if not using gray-scale or 24 bit output
			Output[i*BytesPerPixel]=Output[i*BytesPerPixel+1]=Output[i*BytesPerPixel+2]=Indexes[ReadBits(Input, BitOn+i*IndexSize, IndexSize)];
		Output+=l*BytesPerPixel;
		BitOn+=l*IndexSize;
	} while(BitOn<H.DataSize);

/*	{ //If writing back out to a 24 bit windows bitmap, this adds the row padding back in
		UINT i;
		UCHAR RowPad=4-(H.Width*BytesPerPixel%4); //Account for windows DWORD row padding
		RowPad=(RowPad==4 ? 0 : RowPad);
		Output-=H.Width*H.Height*BytesPerPixel; //Restore original output pointer
		for(i=H.Height;i>0;i--) //Go backwards so data doesn't overwrite itself
			memcpy(Output+(H.Width*BytesPerPixel+RowPad)*i, Output+(H.Width*BytesPerPixel)*i, H.Width*BytesPerPixel);
	}*/
}

UCHAR ReadBits(UCHAR* StartPointer, UINT BitStart, UCHAR BitSize) //Read value from Bit# BitStart after StartPointer - Assumes more than 8 bits are never read
{
	return (*(WORD*)&StartPointer[BitStart/8]>>BitStart%8)&((1<<BitSize)-1);
}

Of course, I added some minor assembly and optimized the decoder code to get it from 335 to 266 bytes, which is only 69 bytes less :-\, but it’s something (measured using my Small project). There is no real reason to include it here, as it’s in many of my projects and the included C file for this post.

And then some test code just for kicks...
//Confirm Decoding
BOOL CheckDecode(UCHAR* Input1, UCHAR* Input2, UINT Width, UINT Height); //Confirm Decoding

//---- Put in main function above "//Free up memory and wait for user input" ----
printf(CheckDecode(InputBuffer, O2, BitmapHead.biWidth, BitmapHead.biHeight) ? "good" : "bad");

BOOL CheckDecode(UCHAR* Input1, UCHAR* Input2, UINT Width, UINT Height) //Confirm Decoding
{
	UINT x,y,i;
	UCHAR RowPad=4-(Width*BytesPerPixel%4); //Account for windows DWORD row padding
	RowPad=(RowPad==4 ? 0 : RowPad);

	for(y=0;y<Height;y++)
		for(x=0;x<Width;x++)
			for(i=0;i<BytesPerPixel;i++)
				if(Input1[y*(Width*BytesPerPixel+RowPad)+x*BytesPerPixel+i]!=Input2[y*(Width*BytesPerPixel)+x*BytesPerPixel+i])
					return FALSE;
	return TRUE;
}

From there, it just has to be loaded into a bit array for manipulation and set back a bitmap device context, and it’s done!
VB Code: (Add the signature GIF as a picture box where it is to show up and set its “Visible” property to “false” and “Appearance” to “flat”)
'Swap in and out bits
Private Declare Function GetDIBits Lib "gdi32" (ByVal aHDC As Long, ByVal hBitmap As Long, ByVal nStartScan As Long, ByVal nNumScans As Long, lpBits As Any, lpBI As BITMAPINFOHEADER, ByVal wUsage As Long) As Long
Private Declare Function SetDIBitsToDevice Lib "gdi32" (ByVal hdc As Long, ByVal x As Long, ByVal y As Long, ByVal dx As Long, ByVal dy As Long, ByVal SrcX As Long, ByVal SrcY As Long, ByVal Scan As Long, ByVal NumScans As Long, Bits As Any, BitsInfo As BITMAPINFOHEADER, ByVal wUsage As Long) As Long
lpBits As Any, lpBitsInfo As BITMAPINFOHEADER, ByVal wUsage As Long, ByVal dwRop As Long) As Long
Private Type RGBQUAD
		b As Byte
		g As Byte
		r As Byte
		Reserved As Byte
End Type
Private Type BITMAPINFOHEADER '40 bytes
		biSize As Long
		biWidth As Long
		biHeight As Long
		biPlanes As Integer
		biBitCount As Integer
		biCompression As Long
		biSizeImage As Long
		biXPelsPerMeter As Long
		biYPelsPerMeter As Long
		biClrUsed As Long
		biClrImportant As Long
End Type
Private Const DIB_RGB_COLORS = 0 '  color table in RGBs

'Prepare colors
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As Long)
Private Declare Function GetBkColor Lib "gdi32" (ByVal hdc As Long) As Long

Public Sub DisplaySignature(ByRef TheForm As Form)
    'Read in Signature
    Dim BitmapLength As Long, OutBitmap() As RGBQUAD, BitInfo As BITMAPINFOHEADER, Signature As PictureBox
    Set Signature = TheForm.Signature
    BitmapLength = Signature.Width * Signature.Height
    ReDim OutBitmap(0 To BitmapLength - 1) As RGBQUAD
    With BitInfo
            .biSize = 40
            .biWidth = Signature.Width
            .biHeight = -Signature.Height
            .biPlanes = 1
            .biBitCount = 32
            .biCompression = 0 'BI_RGB
            .biSizeImage = .biWidth * 4 * -.biHeight
    End With
    GetDIBits Signature.hdc, Signature.Image, 0, Signature.Height, OutBitmap(0), BitInfo, DIB_RGB_COLORS
    
    'Alpha blend signature
    Dim i As Long, Alpha As Double, BackColor As RGBQUAD, ForeColor As RGBQUAD, OBC As Long, OFC As Long
    OFC = &H67906
    OBC = GetBkColor(TheForm.hdc)
    CopyMemory BackColor, OBC, 4
    CopyMemory ForeColor, OFC, 4
    For i = 0 To BitmapLength - 1
        Alpha = 1 - (CDbl(OutBitmap(i).r) / 255)
        OutBitmap(i).r = ForeColor.r * Alpha + BackColor.r * (1 - Alpha)
        OutBitmap(i).g = ForeColor.g * Alpha + BackColor.g * (1 - Alpha)
        OutBitmap(i).b = ForeColor.b * Alpha + BackColor.b * (1 - Alpha)
    Next i
    
    SetDIBitsToDevice TheForm.hdc, Signature.Left, Signature.Top, Signature.Width, Signature.Height, 0, 0, 0, Signature.Height, OutBitmap(0), BitInfo, DIB_RGB_COLORS
    TheForm.Refresh
End Sub

C Code
//Prepare to decode signature
	//const UCHAR BytesPerPixel=4, TranspMask=255; //32 bits per pixel (for quicker copies and such - variable not used due to changing BYTE*s to DWORD*s), and white=transparent background color - also not used anymore since we directly write in the background color
	//Load data from executable
	HGLOBAL GetData=LoadResource(NULL, FindResource(NULL, "DakSig", "Sig")); //Load the resource from the executable
	BYTE *Input=(BYTE*)LockResource(GetData); //Get the resource data

	//Prepare header and decoding data
	UINT BitOn=0; //Bit we are currently on in reading
	EncodedFileHeader H=*(EncodedFileHeader*)Input; //Save header locally so we have quick memory lookups
	DWORD *Output=Signature=new DWORD[H.Width*H.Height]; //Allocate signature memory

	//Prepare the index colors
	DWORD Indexes[17], IndexSize=NumBitsRequired(H.NumIndexes); //Save full color indexes locally so we have quick lookups, use 17 index array so we don't have to allocate memory (since we already know how many there will be), #16=transparent color
	DWORD BackgroundColor=GetSysColor(COLOR_BTNFACE), FontColor=0x067906;
	BYTE *BGC=(BYTE*)&BackgroundColor, *FC=(BYTE*)&FontColor;
	for(UINT i=0;i<16;i++) //Alpha blend the indexes
	{
		float Alpha=((EncodedFileHeader*)Input)->Indexes[i] / 255.0f;
		BYTE IndexColor[4];
		for(int n=0;n<3;n++)
			IndexColor[n]=(BYTE)(BGC[n]*Alpha + FC[n]*(1-Alpha));
		//IndexColor[3]=0; //Don't really need to worry about the last byte as it is unused
		Indexes[i]=*(DWORD*)IndexColor;
	}
	Indexes[16]=BackgroundColor; //Translucent background = window background color

//Unroll/unencode all the pixels
Input+=(sizeof(EncodedFileHeader)+H.NumIndexes); //Start reading input past the header
do
{
	UINT l; //Length (transparent and then index)
	//Transparent pixels
	memsetd(Output, Indexes[16], l=ReadBits(Input, BitOn, H.TranspSize));
	Output+=l;

	//Translucent pixels
	l=ReadBits(Input, BitOn+=H.TranspSize, H.TranslSize);
	BitOn+=H.TranslSize;
	for(i=0;i<l;i++) //Write the gray scale out to the 3 pixels, this should technically be done in a for loop, which would unroll itself anyways, but this way ReadBits+index lookup is only done once - ** Would need to be in a for loop if not using gray-scale or 24 bit output
		Output[i]=Indexes[ReadBits(Input, BitOn+i*IndexSize, IndexSize)];
	Output+=l;
	BitOn+=l*IndexSize;
} while(BitOn<H.DataSize);

//Output the signature
const BITMAPINFOHEADER MyBitmapInfo={sizeof(BITMAPINFOHEADER), 207, 42, 1, 32, BI_RGB, 0, 0, 0, 0, 0};
SetDIBitsToDevice(MyDC, x, y, MyBitmapInfo.biWidth, MyBitmapInfo.biHeight, 0, 0, 0, MyBitmapInfo.biHeight, Signature, (BITMAPINFO*)&MyBitmapInfo, DIB_RGB_COLORS);

This all adds ~3.5KB to each VB project, and ~2KB to each C/CPP project. Some other recent additions to all project executables include the Hyrulean Productions icon (~1KB) and file version information (1-2KB). I know that a few KB doesn’t seem like much, but when executables are often around 10KB, it can almost double their size.

While I’m on the topic of project sizes, I should note that I always compress their executables with UPX, a very nifty executable compressor. It would often be more prudent to use my Small project, but I don’t want to complicate my open-source code.


One other possible solution I did not pursue would be to take the original font and create a subset font of it with only the letters (and font size?) I need, and see if that file is smaller. I doubt it would have worked well though.
W3C and web standards
And steps backwards in software evolution

It’s great to have standards so everything can play together nicely.  I’ve even heard IE8 should pass the Acid2 test with “Web Standard Compatibility” mode turned on, and it has been confirmed for a long time that FireFox3 will (finally) pass it.  Microsoft, of course, has a bit of a problem with backwards compatibility when everyone had to use hacks in the past to “conform” to their old IE software, which was, and still is, filled with bugs and errors; and with IE version upgrades, they need to not break those old websites.  This really technically shouldn’t be a problem if people properly mark their web pages with compatible versions of HTML, XHTML, etc, but who wants to deal with that? Compatibility testing and marking, especially in the web world, is a serious pain in the ass, which I can attest to after working with web site creation for many years, something I am not very proud of :-).  I am a C++ advocate, and Java/.NET hater, and yes, I’ve worked heavily in all of them.


Anyways, some new web standards even break old ones, for example:
<font><center></font></center>
is no longer allowed.  Non nested (ending child elements before the parent) is no longer possible in certain circumstances in HTML4, and definitely not allowed in XHTML, as that would be specifically against what XML was designed for.  This was one of my favorite parts of original HTML too, in that you could easily combine formatting elements in different sections and orders without having to redefine all previous formats each time.  Though CSS does help with this, it has its own quirks too that I consider to be a rather large failing in its design.  I should be expanding more on that later on.

And then there’s this one other oddity that has always bugged me.  Two standard HTML colors are “gray” and “lightgrey”... if that’s not a little confusing... and for the record, “grey” and “lightgray” do not work in IE.

Further, XML, while it has its place and reasons, really really bugs me.  Just the fact that it really slows things up and is overused where it’s not needed because it is the “popular” thing to do.  Come on people, is it that hard to create and use interfaces for binary compiled data?  Or even ini-type files for crying out loud... Until we have specific hardware designed and implemented to parse XML, or better text parsing in general, I will continue to consider XML a step backwards, a very unfortunate reoccurring reality in the software world.