Home Page
Archive > Posts > 2008 > June
Search:

FoxPro Table Memo Corruption
Data integrity loss is such a drag :-(

My father’s optometric practice has been using an old DOS database called “Eyecare” since the (I believe) early 80’s. For many years, he has been programming a new, very customized, database up from scratch in Microsoft Access which is backwards compatible with “Eyecare”, which uses a minor variant of FoxPro databases. I’ve been helping him with minor things on it for a number of years, and more recently I’ve been giving a lot more help in getting it secured and migrated from Microsoft Access databases (.mdb) into MySQL.

A recent problem cropped up in that one of the primary tables started crashing Microsoft Access when it was opened (through a FoxPro ODBC driver). Through some tinkering, he discovered that the memo file (.fpt) for the table was corrupted, as trying to view any memo fields is what crashed Access. He asked me to see if I could help in recovering the file, which fortunately I can do at my leisure, as he keeps paper backups of everything for just such circumstances. He keeps daily backups of everything too… but for some reason that’s not an option.


I went about trying to recover it through the easiest means first, namely, trying to open and export the database through FoxPro, which only recovered 187 of the ~9000 memo records. Next, I tried finding a utility online that did the job, and the first one I found that I thought should work was called “FoxFix”, but it failed miserably. There are a number of other Shareware utilities I could try, but I decided to just see how hard it would be to fix myself first.


I opened the memo file up in a HEX editor, and after some very quick perusing and calculations, it was quite easy to determine the format:

So I continued on the path of seeing what I could do to fix the file.
  • First, I had it jump to the header of each record and just get the record data length, and I very quickly found multiple invalid record lengths.
  • Next, I had it attempt to fix each of these by determining the real length of the memo by searching for the first null terminator (“\0”) character, but I quickly discovered an oddity. There are weird sections in many of the memo fields in the format BYTE{0,0,0,1,0,0,0,1,x}, which is 2 little endian DWORDS which equal 1, and a final byte character (usually 0).
  • I added to the algorithm to include these as part of a memo record, and many more original memo lengths then agreed with my calculated memo lengths.
  • The final thing I did was determine how many invalid (non keyboard) characters there were in the memo data fields. There were ~3500 0x8D characters, which were usually always followed by 0xA, so I assume these were supposed to be line breaks (Windows line breaks are denoted by [0xD/new line/\r],[0xA/carriage return/\n]). There were only 5 other invalid characters, so I just changed these to question marks ‘?’.

Unfortunately, Microsoft Access still crashed when I tried to access the comments fields, so I will next try to just recover the data, tie it to its primary keys (which I will need to determine through the table file [.dbf]), and then rebuild the table. I should be making another post when I get around to doing this.


The following code which “fixes” the table’s memo file took about 2 hours to code up.
//Usually included in windows.h
typedef unsigned long DWORD;
typedef unsigned char BYTE;

//Includes
#include <iostream.h> //cout
#include <stdio.h> //file io
#include <conio.h> //getch
#include <ctype.h> //isprint

//Memo file structure
#pragma warning(disable: 4200) //Remove zero-sized array warning
const MemoFileHeadLength=512;
const RecordBlockLength=32; //This is actually found in the header at (WORD*)(Start+6)
struct MemoRecord //Full structure must be padded at end with \0 to RecordBlockLength
{
	DWORD Type; //Type in little endian, 1=Memo
	DWORD Length; //Length in little endian
	BYTE Data[0];
};
#pragma warning(default: 4200)

//Input and output files
const char *InFile="EXAM.Fpt.old", *OutFile="EXAM.Fpt";

//Assembly functions
__forceinline DWORD BSWAP(DWORD n) //Swaps endianness
{
	_asm mov eax,n
	_asm bswap eax
	_asm mov n, eax
	return n;
}

//Main function
void main()
{
	//Read in file
	const FileSize=6966592; //This should actually be found when the file is opened...
	FILE* MyFile=fopen(InFile, "rb");
	BYTE *MyData=new BYTE[FileSize];
	fread(MyData, FileSize, 1, MyFile);
	fclose(MyFile);

	//Start checking file integrity
	DWORD FilePosition=MemoFileHeadLength; //Where we currently are in the file
	DWORD RecordNum=0, BadRecords=0, BadBreaks=0, BadChars=0; //Data Counters
	const DWORD OneInLE=0x01000000; //One in little endian
	while(FilePosition<FileSize) //Loop until EOF
	{
		FilePosition+=sizeof(((MemoRecord*)NULL)->Type); //Advanced passed record type (1=memo)
		DWORD CurRecordLength=BSWAP(*(DWORD*)(MyData+FilePosition)); //Pull in little endian record size
		cout << "Record #" << RecordNum++ << " reports " << CurRecordLength << " characters long. (Starts at offset " << FilePosition << ")" << endl; //Output record information

		//Determine actual record length
		FilePosition+=sizeof(((MemoRecord*)NULL)->Length); //Advanced passed record length
		DWORD RealRecordLength=0; //Actual record length
		while(true)
		{
			for(;MyData[FilePosition+RealRecordLength]!=0 && FilePosition+RealRecordLength<FileSize;RealRecordLength++) //Loop until \0 is encountered
			{
#if 1 //**Check for valid characters might not be needed
				if(!isprint(MyData[FilePosition+RealRecordLength])) //Makes sure all characters are valid
					if(MyData[FilePosition+RealRecordLength]==0x8D) //**0x8D maybe should be in ValidCharacters string? - If 0x8D is encountered, replace with 0xD
					{
						MyData[FilePosition+RealRecordLength]=0x0D;
						BadBreaks++;
					}
					else //Otherwise, replace with a "?"
					{
						MyData[FilePosition+RealRecordLength]='?';
						BadChars++;
					}
#endif
			}

			//Check for inner record memo - I'm not really sure why these are here as they don't really fit into the proper memo record format.... Format is DWORD(1), DWORD(1), BYTE(0)
			if(((MemoRecord*)(MyData+FilePosition+RealRecordLength))->Type==OneInLE && ((MemoRecord*)(MyData+FilePosition+RealRecordLength))->Length==OneInLE /*&& ((MemoRecord*)(MyData+FilePosition+RealRecordLength))->Data[0]==0*/) //**The last byte seems to be able to be anything, so I removed its check
			{ //If inner record memo, current memo must continue
				((MemoRecord*)(MyData+FilePosition+RealRecordLength))->Data[0]=0; //**This might need to be taken out - Force last byte back to 0
				RealRecordLength+=sizeof(MemoRecord)+1;
			}
			else //Otherwise, current memo is finished
				break;
		}
		if(RealRecordLength!=CurRecordLength) //If given length != found length
		{
			//Tell the user a bad record was found
			cout << "   Real Length=" << RealRecordLength << endl;
			CurRecordLength=RealRecordLength;
			BadRecords++;
			//getch();

			//Update little endian bad record length
			((MemoRecord*)(MyData+FilePosition-sizeof(MemoRecord)))->Length=BSWAP(RealRecordLength);
		}

		//Move to next record - Each record, including RecordLength is padded to RecordBlockLength
		DWORD RealRecordSize=sizeof(MemoRecord)+CurRecordLength;
		FilePosition+=CurRecordLength+(RealRecordSize%RecordBlockLength==0 ? 0 : RecordBlockLength-RealRecordSize%RecordBlockLength);
	}

	//Tell the user file statistics
	cout << "Total bad records=" << BadRecords << endl << "Total bad breaks=" << BadBreaks << endl << "Total bad chars=" << BadChars << endl;

	//Output fixed data to new file
	MyFile=fopen(OutFile, "wb");
	fwrite(MyData, FileSize, 1, MyFile);
	fclose(MyFile);

	//Cleanup and wait for user keystroke to end
	delete[] MyData;
	getch();
}
Inlining Executable Resources
Do you suffer from OPC (Obsessive Perfection Complex)? If not, you aren’t an engineer :-)

I am somewhat obsessive about file cleanliness, and like to have everything I do well organized with any superfluous files removed. This especially translates into my source code, and even more so for released source code.

Before I zip up the source code for any project, I always remove the extraneous workspace compilation files. These usually include:

  • C/C++: Debug & Release directories, *.ncb, *.plg, *.opt, and *.aps
  • VB: *.vbw
  • .NET: *.suo, *.vbproj.user

Unfortunately, a new offender surfaced in the form of the Hyrulean Productions icon and Signature File for about pages. I did not want to have to have every source release include those 2 extra files, so I did research into inlining them in the resource script (.rc) file. Resources are just data directly compiled into an executable, and the resource script tells the executable all of these resources and how to compile them in. All my C projects include a resource script for at least the file version, author information, and Hyrulean Productions icon. Anyways, this turned out to be way more of a pain in the butt than intended.


There are 2 ways to load “raw data” (not a standard format like an icon, bitmap, string table, version information, etc) into a resource script. The first way is through loading an external file:
RESOURCEID RESOURCETYPE DISCARDABLE "ResourceFileName"
for example:
DAKSIG	SIG	DISCARDABLE	"Dakusan.sig"
RESOURCEID and RESOURCETYPE are arbitrary and user defined, and it should also be noted to usually have them in caps, as the compilers seem to often be picky about case.

The second way is through inlining the data:
RESOURCEID	RESOURCETYPE
BEGIN
	DATA
END
for example:
DakSig	Sig
BEGIN
	0x32DA,0x2ACF,0x0306,...
END
Getting the data in the right format for the resource script is a relatively simple task.
  • First, acquire the data in 16-bit encoded format (HEX). I suggest WinHex for this job.
    On a side note, I have been using WinHex for ages and highly recommend it. It’s one of the most well built and fully featured application suites I know if.
  • Lastly, convert the straight HEX DATA (“DA32CF2A0603...”) into an array of proper endian hex values (“0x32DA,0x2ACF,0x0306...”). This can be done with a global replace regular expression of “(..)(..)” to “0x$2$1,”. I recommend Editpad Pro for this kind of work, another of my favorite pieces of software. As a matter of fact, I am writing this post right now in it :-).

Here is where the real caveats and problems start falling into place. First, I noticed the resource data was corrupt for a few bytes at a certain location. It turned out to be Visual Studio wanting line lengths in the resource file to be less than ~4175 characters, so I just added a line break at that point.

This idea worked great for the about page signature, which needed to be raw data anyways, but encoding the icon this way turned out to be impossible :-(. Visual Studio apparently requires external files be loaded if you want to use a pre-defined binary resource type (ICON, BITMAP, etc). The simple solution would be to inline the icon as a user defined raw data type, but unfortunately, the Win32 icon loading API functions (LoadIcon, CreateIconFromResource, LoadImage, etc) only seemed to work with properly defined ICONs. I believe the problem here is that when the compiler loads in the icon to include in the executable, it reformats it somewhat, so I would need to know this format. Again, unfortunately, Win32 APIs failed me. FindResource/FindResourceEx wouldn’t let me load the data for ICON types for direct coping (or reverse engineering) :-(. At this point, it wouldn’t be worth my time to try and get the proper format just to inline my Hyrulean Productions icon into resource scripts. I may come back to it later if I’m ever really bored.


This unfortunately brings back a lot of bad old memories regarding Win32 APIs. A lot of the Windows system is really outdated, not nearly robust enough, or just way too obfuscated, and has, and still does, cause me innumerable migraines trying to get things working with their system.

As an example, I just added the first about page to a C project, and getting fonts working on the form was not only a multi-hour long knockdown drag out due to multiple issues, I ended up having to jury rig the final solution in exasperation due to time constraints. I wanted the C about pages to match the VB ones exactly, but font size numbers just wouldn’t conform between the VB GUI designer and Windows GDI (the Windows graphics API), so I just put in arbitrary font size numbers that matched visually instead of trying to find the right conversion process, as the documented font size conversion process was not yielding proper results. This is the main reason VB (and maybe .NET) are far superior in my book when dealing with GUIs (for ease of use at least, not necessarily ability and power). I know there are libraries out that supposedly solve this problem, but I have not yet found one that I am completely happy with, which is why I had started my own fully fledged cross operating system GUI library a ways back, but it won’t be completed for a long time.

Secure way of proving IP ownership
Proving that you did what you say you did

So I was thinking of a new project that might be fun, useful, and possibly even turn a little profit, but I was talked out of it by a friend due to the true complexity of the prospect past the programming part. The concept isn’t exactly new by a long shot, but my idea for the implementation is, at least I would like to think, novel.

For a very long time, it has been important to be able to prove, without a doubt, that you have the oldest copy of some IP to prove you are the original creator. The usual approach to this is storing copies of the IP at a secure location with the storage time recorded. This is, I am told, very often used in the film industry, as well as many others.

The main downside to this for the subscriber, IMO, is having their IP, which may be confidential, stored by a third party, and entrusting their secrets to an outsider’s security. Of course, if these services are done properly and are ISO certified for non-breachable secure storage, this shouldn’t be a problem as they are probably more secure than anything the user has themselves. One would like to think, though, that entrusting your IP to no one but yourself is the most secure method.

The out-of-house storage method may also require that there be records accessible by others telling that you stored your IP elsewhere, and that it exists, which you may not want known either. This is not always a problem though because some places allow completely anonymous storage.

A large downside for the provider is having to supply and maintain the medium for the secure storage, whether it be vaults for physical property, or hard drives for virtual property.


My solution to this problem, for virtual property anyways, is to not have the provider permanently store the user’s data at all, but provide a means by which the provider can authenticate a set of the user’s data as being unchanged since a certain date. This would be accomplished by hashing the user’s data against a random salt. The salt would be determined by the date and would only be known by the provider.


This would work as follows:
  • Every day, the server would create a completely random salt string of a fixed length, probably 128 bits. This random salt would be the only thing the server would need to remember and keep secret. This process could also be done beforehand for many days or years.
  • As the user uploaded the data through a secure connection, the server would process it in chunks, probably 1MB at a time, through the hash function.
  • The hash function, probably a standard one like MD5, would be slightly modified to multiply the current hash on each block iteration against the daily random salt. The salt would also be updated per upload by a known variable, like multiplying the salt against the upload size, which would be known beforehand, or the exact time of upload.
  • A signature from a public-key certificate of a combined string of the time of upload and the hash would be calculated.
  • The user would be returned a confirmation string, which they would need to keep, that contained the time of upload the signature.

Whenever the user wanted to verify their data, they would just have to resend their data and the confirmation string, and the server would return if it is valid or not.

I was thinking the service would be free for maybe 10MB a day. Different account tiers with appropriate fees would be available that would give the user 1 month of access and an amount of upload bandwidth credits, with would roll over each month. Unlimited verifications would also be allowed to account holders, though these uploads would still be applied towards the user’s credits. Verifications without an account would be a nominal charge.

The only thing keeping me from pursuing this idea is that for it to be truly worth it to the end user’s, the processing site and salt tables would have to be secured and ISO certified as such, which would be a lot more costly and trouble than the initial return would provide, I believe, and I don’t have the money to invest in it right now.


I do need to find one of these normal storage services soon for myself soon though. I’ll post again about this when I do.



[edit on 6/15/08 @ 5:04pm]
Well, this isn’t exactly the same thing, but a lot like it.
http://www.e-timestamp.com