Software likes hiding sensitive information and keeping it persistent :-(
Since version 3 of Firefox, the browser has moved over from using flat files for keeping track of browsing history (history.dat) and bookmarks (bookmarks.html) to using SQLite databases (places.sqlite). This change over was required because the old flat file formats were badly implemented, clunky, and not able to handle the new demands of the location bar and browser history. Using a SQL database was the perfect solution for the complexity brought in with the new location bar and its dynamic searching of previous URLS, as SQL is easy to implement, is mostly compatible against multiple SQL application implementations (removing dependency on a single product), and powerful for cross referencing lookups. As a matter of fact, most of the data Firefox keeps now is stored in SQLite databases.
SQLite was also a good choice for the SQL solution because it can be implemented minimally straight into a product without needing a large install and a lot of bloat. While I like SQLite for this purpose and its ease of implementation, it lacks a lot of base SQL functionality that would be nice, like TABLE JOINS inside of DELETE statements, among many other language abilities. I wouldn’t suggest using it for large database driven products that require high optimization, which I believe it can’t handle. It’s meant as a simpler SQL implementation.
Anyways, I was very happy to see that when you delete URLs from the history in the newest version of Firefox that it actually deletes them out of the database as opposed to just hiding them, like it used to. The history manager actual seems to do its job quite well now, but I noticed one big problem. After attempting to delete all the URLs from a specific site out of the Firefox history manager, I noticed there were still some entries from that site in the SQLite database, which is a privacy problem.
After some digging, I realized that there are “hidden” entries inside of the history manager. A hidden entry is created when a URL is loaded in a frame or IFrame that you do not directly navigate too. These entries cannot be viewed through the history manager, and because of this, cannot be easily deleted outside of the history database without wiping the whole history.
At this point, I decided to go ahead and look at all the table structures for the history manager and figure out how they interact. Hidden entries are marked in places.sqlite::moz_places.history with the value “1”. According to a Firefox wiki “A hidden URL is one that the user did not specifically navigate to. These are commonly embedded pages, i-frames, RSS bookmarks and javascript calls.” So after figuring all of this out, I came up with some SQL commands to delete all hidden entries, which don’t really do anything anyways inside the database. Do note that Firefox has to be closed to work on the database so it is not locked.
sqlite3 places.sqlite
DELETE FROM moz_annos WHERE place_id IN (SELECT ID FROM moz_places WHERE hidden=1);
DELETE FROM moz_inputhistory WHERE place_id IN (SELECT ID FROM moz_places WHERE hidden=1);
DELETE FROM moz_historyvisits WHERE place_id IN (SELECT ID FROM moz_places WHERE hidden=1);
DELETE FROM moz_places WHERE hidden=1;
.exit
This could all be done in 1 SQL statement in MySQL, but again, SQLite is not as robust :-\. There is also a “Favorite’s Icon” table in the database that might keep an icon stored as long as a hidden entry for the domain still exists, but I didn’t really look into it.
I’ve been delving into the Perl language more lately for a job, and have found out some interesting things about it. Perl itself is a bit shrouded in mysticism, with it often being said that it runs on “magic”. The original Perl engine, written by Larry Wall, has never been duplicated due to its incredible complexity and hacked together nature.
One funny little thing I noticed is that an arrow “=>” and comma “,” are completely synonymous in the language. For example, this is how you SHOULD declare a hash and an array, because it just looks better and is proper coding standards:
@MyArray=('a',1,'b',2); #An array with values a,1,b,2
%MyHash=(a=>1, b=>2); #A hash with keys a,b that contain the values 1,2
but you can actually declare the exact same array and hash objects like this
@MyArray=('a'=>1=>'b'=>2); #An array with values a,1,b,2
%MyHash=(a,1,b,2); #A hash with keys a,b that contain the values 1,2
It’s also easy to find the length of a non referenced array in Perl as follows:
print $#MyArray; #Index of the last element, so add 1 to get length
or
$ArrayLength=@MyArray;
print $ArrayLength;
There are two ways to do it with a referenced array:
$MyRefArray=[1,2,3];
print scalar @$MyRefArray;
print $#$MyRefArray; #Index of the last element, so add 1 to get length
Moral of the story: there are many ways to do things in Perl.
After now having delved a bit more into how Perl works, I still like PHP better as a strictly quick scripting language. Oh well.
JavaScript is a neat little scripting language and does the job it is intended for very well. The prototype system is very useful too, but has one major drawback. First, however, a very quick primer on how objects are made in JavaScript and what prototyping is.
An object is made in JavaScript by calling a named function with the keyword “new”.
function FooBar(ExampleArgument)
{
this.Member1=ExampleArgument;
this.AnotherMember='Blah';
}
var MyObject=new FooBar(5);
This code creates a FooBar object in the variable MyObject with 2 members: Member1=5, and AnotherMember='Blah' .
Prototyping adds members to all objects of a certain type, without having to add the member to it manually. This also allows you to change the value of a member of all objects of a single type at once. For example (all examples are continued from above examples):
FooBar.prototype.NewMember=7;
var SecondObject=new FooBar();
Now both MyObject and SecondObject have a member NewMember with value 7, which can be changed easily for both objects like this:
FooBar.prototype.NewMember=9;
The way to detect if an object has a member is to use the in function, and then to determine if the member is prototyped, the hasOwnProperty function is used. For example:
'NewMember' in MyObject; //Returns true
MyObject.hasOwnProperty('NewMember'); //Returns false
'Member1' in MyObject; //Returns true
MyObject.hasOwnProperty('Member1'); //Returns true
'UnknownMember' in MyObject; //Returns false
MyObject.hasOwnProperty('UnknownMember'); //Returns false
Now, the problem starts coming into play when using foreach loops.
for(var i in MyObject)
console.log( i + '=' + MyObject[i].toString() ); //console.log is a function provided by FireBug for FireFox, and Google Chrome
This would output:
Member1=5
AnotherMember=Blah
NewMember=9
So if you wanted to do something on all members of an object and skip the prototype members, you would have to add a line of code to each foreach loop as follows:
for(var i in MyObject)
if(MyObject.hasOwnProperty(i))
console.log(i+'='+MyObject[i].toString());
This would output:
Member1=5
AnotherMember=Blah
This isn’t too bad if you are using prototyping yourself on your objects, but sometimes you might make objects that you wouldn’t expect to have prototypes. For good coding practice, you should really do the prototype check for every foreach loop because you can never assume that someone else will not add a prototype to an object type, even if your object type is private. This is especially true because all objects inherit from the actual Object object including its prototypes. So if someone does the following, which is considered very bad practice, every foreach loop will pick up this added member for all objects.
Object.prototype.GlobalMember=10;
You might ask “Why anyone would do this?”, but it could be useful for an instance like this...
Object.prototype.indexOf=function(Value)
{
for(var i in this)
if(this.hasOwnProperty(i) && this[i]===Value)
return i;
return undefined;
}
This function will search for the first member that contains the given value and return the member’s name.
It would be really nice if “for(x in y)” only returned non-prototype members and there was another type of foreach loop like “for(x inall y)” that also returned prototype members :-\.
This is especially important for Array objects. Arrays are like any other object but they come naturally with the JavaScript language. For Arrays, it is most appropriate to use
for(var i=0;i<ArrayObject.length;i++)
instead of
for(var i in ArrayObject)
loops. Also, in my own code, I often add the following because the “indexOf” function for Arrays is not available in IE, as it is not W3C standard. It is in Firefox though... but I’m not sure if this is a good thing, as it is not a standard.
I’m not going to go into how JavaScript stores the prototypes or how to find out all prototype members of an object, as that is a bit beyond what I wanted to talk about in this post, and it’s pretty self explanatory if you think about it.
function ClearCookies() //Clear all the cookies on the current website
{
var MyCookies=document.cookie; //Remember the original cookie string since it will be changing soon
var StartAt=0; //The current string pointer in MyCookies
do //Loop through all cookies
{
var CookieName=MyCookies.substring(StartAt, MyCookies.indexOf('=', StartAt)).replace(/^ /,''); //Get the next cookie name in the list, and strip off leading white space
document.cookie=CookieName+"=;expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/"; //Erase the cookie
StartAt=MyCookies.indexOf(';', StartAt)+1; //Move the string pointer to the end of the current cookie
} while(StartAt!=0)
}
I went a little further with the script after finishing this to add a bit of a visual aspect.
The following adds a textarea box which displays the current cookies for the site, and also displays the cookie names when they are erased.
<input type=button value="Clear Cookies" onclick="ClearCookies()">
<input type=button value="View Cookies" onclick="ViewCookies()">
<textarea id=CookieBox style="width:100%;height:100%"></textarea>
<script type="text/javascript">
function ViewCookies() //Output the current cookies in the textbox
{
document.getElementById('CookieBox').value=document.cookie.replace(/;/g,';\n\n');
}
function ClearCookies() //Clear all the cookies on the current website
{
var CookieNames=[]; //Remember the cookie names as we erase them for later output
var MyCookies=document.cookie; //Remember the original cookie string since it will be changing soon
var StartAt=0; //The current string pointer in MyCookies
do //Loop through all cookies
{
var CookieName=MyCookies.substring(StartAt, MyCookies.indexOf('=', StartAt)).replace(/^ /,''); //Get the next cookie name in the list, and strip off leading white space
CookieNames.push(CookieName); //Remember the cookie name
document.cookie=CookieName+"=;expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/"; //Erase the cookie
StartAt=MyCookies.indexOf(';', StartAt)+1; //Move the string pointer to the end of the current cookie
} while(StartAt!=0)
document.getElementById('CookieBox').value='Clearing: '+CookieNames.join("\nClearing: "); //Output the erased cookie names
}
</script>
Never rely solely on information you receive from untrusted sources
One of the most laughable aspects of client/server* systems is client side based security access restrictions. What I mean by this is when credentials and actions are not checked and restricted on the server side of the equation, only on the client side, which can ALWAYS be bypassed.
To briefly explain why it is basically insane to trust a client computer; ANY multimedia, software, data, etc that has touched a person’s computer is essentially now their property. Once something has been on or through a person’s computer, the user can make copies, modify it, and do whatever the heck they want with it. This is how the digital world works. There are ways to help stop copying and modification, like hashes and encryption, but most of the ways in which things are implemented nowadays are quite fallible. There may be, for example, safeguards in place to only allow a user to use a piece of software on one certain computer or for a certain amount of time (DRM [Digital Rights Management]), but these methods are ALWAYS bypassable. The only true security comes by not letting information which people aren’t supposed to have access to cross through their computer, and keeping track of all verifiable factual information on secure servers. A long time ago at an IGDA [International Game Developers Association] meeting (I only ever went to the one unfortunately :-\), I learned an interesting truth that hadn’t occurred to me before from the lecturer. That is, that companies that make games and other software [usually] know it will sooner or later be pirated/cracked**. The true intention of software DRM is to make it hard enough to crack to discourage the crackers into giving up, and to make it take long enough so that hopefully people stop waiting for a free copy and go ahead and buy it. By the time a piece of software is cracked (if it takes as long as they hope), the companies know the majority of the remainder of the people usually wouldn’t have bought it anyways. Now I’m done with the basic explanation of client side insecurities, back to the real reason for this post.
While it is actually proper to program safeguards into client side software, you can never rely on it for true security. Security measures should always be duplicated in both client and server software. There are two reasons off the top of my head for implementing security access restrictions into the client side of software. The first is to help remove strain on servers. There is no point in asking a server if something is valid when the client can immediately confirm that it isn’t. The second reason is for speed. It’s MUCH quicker if a client can detect a problem and instantly inform the user than having to wait for a server to answer, though this time is usually imperceptible to the user, it can really add up.
So I thought I’d give a couple of examples of this to help you understand more where I’m coming from. This is a very big problem in the software industry. I find exploitable instances of this kind of thing on a very regular basis. However, I generally don’t take advantage of such holes, and try to inform the companies/programmers if they’ll listen. The term for this is white hat hacking, as opposed to black hat.
First, a very basic example. Let’s say you have a folder on your website “/PersonalPictures” that you wanted to restrict access to with a password. The proper way to do it would be to restrict access to the whole folder and all files in it on the server side, requiring a password be sent to the server to view the contents of each file. This is normally done through Apache httpd (the most utilized web server software) with an “.htaccess” file and the mod_auth (authentication) module. The improper way to do it would be a page that forwarded to the “hidden” section with a JavaScript script like the following.
if(prompt('Please enter the password')=='SecretPassword')
document.location.href='/PersonalPictures';
The problem with this code is two fold (besides the fact it pops up a request window :-) ). First, the password is exposed in plain text to the user. Fortunately, passwords are usually not as easy to find as this, but I have found passwords in web pages and Flash code before with some digging (yes, Flash files (and Java!) are 100% decompilable to their original source code, sans comments). The second problem is that once the person goes to the URL “/PersonalPictures”, they can get back there and to all files inside it without the password, and also give it freely to others (no need to mention the fact that the URL is written in plain text here, as it’s the same as with the password). This specific problem with JavaScript was much more prevalent in the old day when people ran their web pages through free hosting sites like Geocities (now owned and operated by Yahoo) which didn’t allow for proper password protection.
This kind of problem is still around on the web, though it morphed with the times into a new form. Many server side scripts I have found across the Internet assume their client side web pages can take care of security and ignore the necessary checks in the server scripts. For example, very recently I was on a website that only allowed me to add a few items to a list. The way it was done is that there was a form with a textbox that you submitted every time you wanted to add an entry to the list. After submitting, the page was reloaded with the updated list. After you added the maximum allowed number of items to the list, when the page refreshed, the form to add more was gone. This is incredibly easy to bypass however. The normal way to do this would be to just send the modified packets directly to the server with whatever information you want in it. The easier method would be to make your own form submission page and just submit to the proper URL all you want. The Firebug extension for Firefox however makes this kind of thing INCREDIBLY easy. All that needs to be done is to add an attribute to the form to send the requests to a new window “<form action=... method=... target=_blank>”, so the form is never erased/overwritten and you can keep sending requests all you want. Using Firebug, you can also edit the values of hidden input boxes for this kind of thing.
AJAX (Asynchronous JavaScript and XML - A tool used in web programming to send and receive data from a server without having to refresh a page) has often been lampooned as insecure for this kind of reason. In reality, the medium itself is not insecure at all; it’s just how people use it.
As a matter of fact, the majority of my best and most fun Ragnarok hacking was done with these methods. I just monitored the packets that came in and out of the system, reverse engineered how they were all structured, then made modifications and resent them myself to see what I could do. With this, I was able to do things like (These should be most of the exploits; listed in descending order of usefulness & severity):
Duplicate items
Crash the server (It was never fixed AFAIK, but I stopped playing 5+ years ago. I just put that it was fixed on my site so people wouldn’t look for it ^_^; )
Warp to any map from any warp location (warp locations are only supposed to link to 1 other map)
Spoof your name during chats (so you could pretend someone else was saying something - Ender’s game, anyone? ^_^)
Use certain skills of other classes (I have up pictures of my swordsman using merchant skills to house a selling shop)
Add skills points to an item on your skill tree that is not yet available (and use it immediately)
Warp back to save point without dying
Talk to NPCs on a map from any location on that map, and sometimes from other maps (great for selling items when in a dungeon)
Attack with weapons much quicker than was supposed to be allowed
Use certain skills on creatures from any location on a map no matter how far they are
Equip any item in any spot (so you could equip body armor on your head slot and get much more free armor defense points)
Run commands on your party/guild and in chat rooms as if you were the leader/admin
Rollback a characters stat’s to when you logged on that session (part of the dupe hack)
Bypass text repetition, length, and curse filters
Find out user account names
The original list is here; it should contain most of what I found. I took it down very soon after putting it up (replacement here) because I didn’t want to explicitly screw the game over with people finding out about these hacks (I had a lot of bad encounters with the company that ran the game, they refused to acknowledge or fix existing bugs when I reported them). There were so many things the server didn’t check just because the client wasn’t allowed to do them naturally.
Here are some very old news stories I saved up for when I wrote about this subject:
Just because you don’t give someone a way to do something doesn’t mean they won’t find a way.
*A server is a computer you connect to and a client is the connecting computer. So all you people connecting to this website are clients connecting to my web server.
**“Cracked” usually means to make a piece of software usable when it is not supposed to be, bypassing the DRM
Bad Programming: Only using file extensions as an indicator
According to a Microsoft KB article titled “Virtual directory names with executable extensions are not used correctly”, using a virtual folder ending in an executable extension (like .com, .exe, .dll, or .sh) under the web server for IIS [Microsoft’s Internet information services server suite] makes the contents inside the folder unviewable. This behavior itself is kind of silly, as you’d assume a web server would always check to see if something was a file or folder first.
Unfortunately, this doesn’t apply to just virtual folders, but all folders under an IIS web server, as I found out a few years ago when I backed up a site that I knew would be taken down very soon (ironically, because the company [SysInternals] was being taken over by Microsoft) and mirrored it on my Home Server, which runs IIS.
The solution I used was to add a character (in my case an underscore “_”) to the end of all the directory names ending in “.com” and then doing a global regular expression replace through all files in the mirror to replace any occurrences of these directories.
A very important part of the design world is fonts, but it is an unfortunately annoying part of web browser land. There are very few fonts that come by default with OSs and even less default ones that match each other across all OSs, so your website won’t look the same across all platforms unless you use the right combinations. It’s much pretty guaranteed that if you want anything even remotely special in terms of a font somewhere on your website, you will be out of luck to match it across all platforms.
The commonplace solution for this is, of course, creating images for whenever you need special fonts displayed. While this is the most elegant solution, it is only appropriate for special circumstances, and not normal site content, as image file sizes can get ridiculous, and you lose plain text advantages like searchability and search engine recognition. Another solution is to request the user to download the font, like here. While this is a valid solution, the vast majority of users would not download the font because, mostly, they don’t care enough, and secondly, people generally know not to go download unfamiliar files on the internet when they don’t have to, for security reasons.
This has actually been a problem for me recently as I realized some of the default fonts I use for my site, which have always come with Windows, do not have default equivalents that come with most Linux distributions, as I had assumed. That’s a topic for a different day though.
So I had a customer recently request the ability to dynamically display some text in a certain font, so I told him there are 2 solutions. The first would be to use JavaScript to load translucent PNG images, the second would be to embed a Flash applet, as Flash can store font files internally for use. So here are instructions and examples of both:
JavaScript + PNG Translucency (alpha blending) Method
There are 2 ways to create the PNG translucency in Photoshop; one easier but less effective way that doesn’t maintain quality, and a slightly more complex path with better results.
To start off for both paths, a screenshot (ALT+PRINT SCREEN to take only the current window) will need to be taken of the font rendered in black against a white background. This can be done in your favorite word processor as long as it properly renders with translucency, or (for Windows) by just going to the font file in “c:\windows\fonts” and opening it, which uses “fontview.exe”.
After you have the screenshot, open a new file in Photoshop (File > New ORCTRL+N) and paste the screenshot into a new layer (Edit > Paste ORCTRL+V)
Delete the background layer, which requires the layer window is open (Window > Layers ORF7 to toggle its display). Right click the text portion “Background” of the background layer, and choose “Delete Layer”.
Select the region that contains your font’s alphabet (M for selection tool) and crop it (Image > Crop).
You might want to zoom in at this point for easier viewing (CTRL++ for in, CTRL+- for out).
The easy way from there:
Deselect the area (Select > Deselect ORCTRL+D).
Select the Magic Wand tool (W), set Tolerance to 0, check Anti-Aliased, and uncheck Contiguous
Select a pure white pixel and then delete the selection (DELETE)
You now have a translucent image that you can save and use, but the translucency isn’t that of the original font, as that is not how the magic wand tool works.
Example using “Aeolus True Type Font” (Set against a green background via HTML for example sake)
The better way:
Add a mask to your current layer (Layer > Add Layer Mask > Reveal All)
Go to the channels window (Window > Channels to toggle its display, it should be in the same window as Layers, in a separate tab) and select either the red, green, or blue layer. It doesn’t matter which as they should all hold the exact same values (grayscale [white-black colors] have the same red, green, and blue values), so red channel (CTRL+1) is fine.
Copy the channel (CTRL+C) (the entire workspace should still be selected after the crop)
Select the mask channel (CTRL+\), and you also need to make it visible (toggle the little eyeball icon besides it)
Paste into the mask channel (CTRL+V), invert it (Image > Adjustments > Invert ORCTRL+I), and then make it invisible again (untoggle little eyeball icon besides it)
Reselect the RGB contents (CTRL+~) and flood fill it with black [or your color of choice]: Paint Bucket Tool (G), 255 tolerance, no antialias
You now have a translucent image of the font that you can save and use that has the original font quality. You can test it by adding a white layer below it.
Example using “Aeolus True Type Font” (Set against a green background via HTML for example sake)
From there the image file can be split up into individual images called “a.png”, “b.png”, etc, and a simple JavaScript string could be used to convert a string to display the picture text like “'MyString'.replace(/(.)/g, '<img src="$1.png">')”.
Example (this is produced by JavaScript):
Internet Explorer 6 also has the added problem of not allowing translucent images, so a hack is needed for this. Basically, an element (like a blank image) needs to have its filter style set like the following (JavaScript DirectX hack...)
Flash Method
While this method is much quicker to complete and easier to pull off than the previous method, it is also more prone to problems and browser incompatibility. Flash and JavaScript never got along well enough in my book. Anywho, here’s the process. (Source file here)
In a new Flash document (v5.0+), create a text box with the following properties:
Type: “Dynamic Text”
var: MyText
Font: YOURFONTCHOICE
Embed (button): Select the set of characters the dynamic text box might display. The less glyphs you select, the smaller the output file will be. I included all alpha-numeric+punctuation in the below example (24.3KB).
That’s all you need for the Flash file, so all that’s left now is the JavaScript. The following function will set the text for you inside the movie. Also, you should set the embed (for normal browsers) and object (for IE) tags as different “id”s. The wmode is an important parameter here too, in that it makes the background invisible and the Flash applet more a part of the web page (not a “separate window”).
So for reasons I’m not going to go into, today I had to compare some log files. I was tempted to write the code in C, just because I miss it so much these days x.x;, but laziness won out, especially as there weren’t that many log files and they weren’t that large, so I wrote it in PHP.
Nothing else to this post except the working code which took me about 5 minutes to type out... The function goes through one directory and all of its subdirectories and checks all files against the same path in a second directory. If the file doesn’t exist in the second directory or its contents doesn’t match the first file up to the first file’s length, a message is outputted.
//Start the log run against 2 root directories
TestLogs("/DIR1", "/DIR2");
function TestLogs($RootDir1, $RootDir2, $CurDir="")
{
//Iterate through the first directory
$Dir1=opendir("$RootDir1$CurDir");
$SubDirs=Array(); //Holds subdirectories
while($File=readdir($Dir1))
if($File=="." || $File=="..") //Skip . and ..
continue;
else if(is_dir("$RootDir1$CurDir/$File")) //Do not try to compare directory entries
$SubDirs[]=$File; //Remember subdirectories
else if(!file_exists("$RootDir2$CurDir/$File"))
print "File '$CurDir/$File' does not exist in second directory.<br>";
else if(file_get_contents("$RootDir1$CurDir/$File")!=substr(file_get_contents("$RootDir2$CurDir/$File"),0,filesize("$RootDir1$CurDir/$File"))) //Both files exist, so compare them - if first file does not equal second file up to the same length, output error
print "'$CurDir/$File' does not match.<br>";
//Run subdirectories recursively after current directories' file-run so directories do not get split up
foreach($SubDirs as $NewDir)
TestLogs($RootDir1, $RootDir2, "$CurDir/$NewDir");
}
Today I thought I’d give a demonstration on the use of regular expressions [reference page here]. Regular expressions are basically a simplified scripting language for finding and replacing complex text strings, and are implemented into much of today’s software which involve a lot of text editing. They are a fabulously handy tool for computer users and are especially useful for programmers. I believe RegExps actually originally gained their notoriety through the Perl programming language. I also recently heard that it is definite that the new version of C++ (C++0x) will have native library support for regular expressions, yay!
Since I posted yesterday on DNS stuff, and have the examples from it handy, I figured I’d use those :-).
Let’s say you had a group of .com domains and wanted to find out their name servers (I’ve had to do this when switching to new name servers to make sure all the domains we did not control at the registrar level had their name servers set to the new ones). For this example, we will use the following domains “castledragmire.com”, “riaboy.com”, “NonExistantDomainA.com”, and “dakusan.com”.
First, we’d need to have the list of the domains, for this example, one domain per line is used.
Next, we need to turn them into a bash (Linux) script to grab all the information we need.
Replace: “^(.*)$”
With: “echo '!?$1?!'; host -t ns $1 a.gtld-servers.net | grep ' name server ';”
Sample output: (The !? ?! stuff are markers for easier viewing and parsing)
echo '!?castledragmire.com?!'; host -t ns castledragmire.com a.gtld-servers.net | grep ' name server ';
echo '!?riaboy.com?!'; host -t ns riaboy.com a.gtld-servers.net | grep ' name server ';
echo '!?NonExistantDomainA.com?!'; host -t ns NonExistantDomainA.com a.gtld-servers.net | grep ' name server ';
echo '!?dakusan.com?!'; host -t ns dakusan.com a.gtld-servers.net | grep ' name server ';
Next, we run the script, and it would output the following:
!?castledragmire.com?!
castledragmire.com name server ns3.deltaarc.com.
castledragmire.com name server ns4.deltaarc.com.
!?riaboy.com?!
riaboy.com name server ns3.deltaarc.com.
riaboy.com name server ns4.deltaarc.com.
!?NonExistantDomainA.com?!
!?dakusan.com?!
dakusan.com name server ns3.deltaarc.com.
dakusan.com name server ns4.deltaarc.com.
Next, we would keep running the following regular expression until no more replacements are found.
This would combine all domains with multiple name servers onto one line with name servers separated by spaces.
Replace: “(.*?) name server (.*)\n\1 name server (.*)”
With: “$1 name server $2$3”
It would output the following:
!?castledragmire.com?!
castledragmire.com name server ns3.deltaarc.com. ns4.deltaarc.com.
!?riaboy.com?!
riaboy.com name server ns3.deltaarc.com. ns4.deltaarc.com.
!?NonExistantDomainA.com?!
!?dakusan.com?!
dakusan.com name server ns3.deltaarc.com. ns4.deltaarc.com.
The final regular expression would turn the output into a single line per domain, followed by its domain servers. The current extra line before the list of name servers is to help spot any domains that did not provide us with name servers.
Replace: “!\?(.*?)\?!\n\1 name server (.*)”
With: “#$1\t$2”
Which would output the final following data:
Yesterday I wrote a bit about the DNS system being rather fussy, so I thought today I’d go a bit more into how DNS works, and some good tools for problem solving in this area.
First, some technical background on the subject is required.
A network is simply a group of computers hooked together to communicate with each other. In the old days, all networking was done through physical wires (called the medium), but nowadays much of it is done through wireless connections. Wired networking is still required for the fastest communications, and is especially important for major backbones (the super highly utilized lines that connect networks together across the world).
A LAN is a local network of all computers connected together in one physical location, whether it be a single room, a building, or a city. Technically, a LAN doesn’t have to be localized in one area, but it is preferred, and we will just assume it is so for arguments sake :-).
A WAN is a Wide (Area) Network that connects multiple LANs together. This is what the Internet is.
The way one computer finds another computer on a network is through its IP Address [hereby referred to as IPs in this post only]. There are other protocols, but this (TCP/IP) is by far the most widely utilized and is the true backbone of the Internet. IPs are like a house’s address (123 Fake Street, Theoretical City, Made Up Country). To explain it in a very simplified manner (this isn’t even remotely accurate, as networking is a complicated topic, but this is a good generalization), IPs have 4 sections of numbers ranging from 0-255 (1 byte). For example, 67.45.32.28 is a (class 4) IP. Each number in that address is a broader location, so the “28” is like a street address, “32” is the street, “45” is the city, and “67” is the country. When you send a packet from your computer, it goes to your local (street) router which then passes it to the city router and so on until it reaches its destination. If you are in the same city as the final destination of the packet, then it wouldn’t have to go to the country level.
The final important part of networking (for this post) is the domain system (DNS) itself. A domain is a label for an IP Address, like calling “1600 Pennsylvania Avenue” as “The White House”. As an example, “www.castledragmire.com” just maps to my web server at “209.85.115.128” (this is the current IP, it will change if the site is ever moved to a new server).
Next is a brief lesson on how DNS itself works:
The root DNS servers (a.root-servers.net through m.root-servers.net) point to the servers that hold top-level-domain information (.com, .org., .net, .jp, etc)
Examples of these servers are as follows:
au
ns1.audns.net.au
biz
E.GTLD.biz
ca
CA04.CIRA.ca
cn
A.DNS.cn
com&net
A.GTLD-SERVERS.NET
de
Z.NIC.de
eu
U.NIC.eu
info
B9.INFO.AFILIAS-NST.ORG
org
TLD1.ULTRADNS.NET
tv
C5.NSTLD.COM
Next, these root name servers (like A.GTLD-SERVERS.NET through M.GTLD-SERVERS.NET for .com) hold two main pieces of information for ALL domains under their top-level-domain jurisdiction:
The registrar where the domain was registered
The name server(s) that are responsible for the domain
Only registrars can talk to these root servers, so you have to go through the registrar to change the name server information.
The final lowest rung in the DNS hierarchy is name servers. Name servers hold all the actual addressing information for a domain and can be run by anyone. The 2 most important (or maybe relevant is a better word...) types of DNS records are:
A: There should be many of these, each pointing a domain or subdomain (castledragmire.com, www.castledragmire.com, info.castledragmire.com, ...) to a specific IP address (version 4)
SOA: Start of Authority - There is only one of these records per domain, and it specifies authoritative information including the primary name server, the domain administrator’s email, the domain serial number, and several timeout values relating to refreshing domain information.
Now that we have all the basics down, on to the actual reason for this post. It’s really a nuisance trying to explain to people why their domain isn’t working, or is pointing to the wrong place. So here’s why it happens!
Back in the old days, it often took days for DNS propagation to happen after you made changes at your registrar or elsewhere, but fortunately, this problem is of the past. The reason for this is that ISPs and/or routers cached domain lookups and only refreshed them according to the metrics in the SOA record mentioned above, as they were supposed to. This was done for network speed reasons, as I believe older OSs might not have cached domains (wild speculation), and ISPs didn’t want to look up the address for a domain every time it was requested. Now, though, I rarely see caching on any level except at the local computer; not only on the OS level, but even some programs cache domains, like FireFox.
So the answer for when a person is getting the wrong address for a domain, and you know it is set correctly, is usually to just reboot. Clearing the DNS cache works too (for the OS level), but explaining how to do that is harder than saying “just reboot” ^_^;.
To clear the DNS cache in XP, enter the following into your “run” menu or in the command prompt: “ipconfig /flushdns”. This does not ALWAYS work, but it should work.
If your domain is still resolving to the wrong address when you ping it after your DNS cache is cleared, the next step is to see what name servers are being used for the information. You can do a whois on your domain to get the information directly form the registrar who controls the domain, but be careful where you do this as you never know what people are doing with the information. For a quick and secure whois, you can use “whois” from your linux command line, which I have patched through to a web script here. This script gives both normal and extended information, FYI.
Whois just tells you the name servers that you SHOULD be contacting, it doesn’t mean these are the ones you are asking, as the root DNS servers may not have updated the information yet. This is where our command line programs come into play.
In XP, you can use “nslookup -query=hinfo DOMAINNAME” and “nslookup -query=soa DOMAINNAME” to get a domain’s name servers, and then “nslookup NAMESERVERDOMAINNAME” to get the IP the name server points too. For example: (Important information in the following examples are bolded and in white)
Nslookup is also available in Linux, but Linux has a better tool for this, as nslookup itself doesn’t always seem to give the correct answers, for some reason. So I recommend you use dig if you have it or Linux available to you. So with dig, we just start at the root name servers and work our way up to the SOA name server to get the real information of where the domain is resolving to and why.
root@www [~]# dig @a.root-servers.net castledragmire.com
; <<>> DiG 9.2.4 <<>> @a.root-servers.net castledragmire.com
; (2 servers found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5587
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 14
;; QUESTION SECTION:
;castledragmire.com. IN A
;; AUTHORITY SECTION:
com. 172800 IN NS H.GTLD-SERVERS.NET.
com. 172800 IN NS I.GTLD-SERVERS.NET.
com. 172800 IN NS J.GTLD-SERVERS.NET.
com. 172800 IN NS K.GTLD-SERVERS.NET.
com. 172800 IN NS L.GTLD-SERVERS.NET.
com. 172800 IN NS M.GTLD-SERVERS.NET.
com. 172800 IN NS A.GTLD-SERVERS.NET.
com. 172800 IN NS B.GTLD-SERVERS.NET.
com. 172800 IN NS C.GTLD-SERVERS.NET.
com. 172800 IN NS D.GTLD-SERVERS.NET.
com. 172800 IN NS E.GTLD-SERVERS.NET.
com. 172800 IN NS F.GTLD-SERVERS.NET.
com. 172800 IN NS G.GTLD-SERVERS.NET.
;; ADDITIONAL SECTION:
A.GTLD-SERVERS.NET. 172800 IN A 192.5.6.30
A.GTLD-SERVERS.NET. 172800 IN AAAA 2001:503:a83e::2:30
B.GTLD-SERVERS.NET. 172800 IN A 192.33.14.30
B.GTLD-SERVERS.NET. 172800 IN AAAA 2001:503:231d::2:30
C.GTLD-SERVERS.NET. 172800 IN A 192.26.92.30
D.GTLD-SERVERS.NET. 172800 IN A 192.31.80.30
E.GTLD-SERVERS.NET. 172800 IN A 192.12.94.30
F.GTLD-SERVERS.NET. 172800 IN A 192.35.51.30
G.GTLD-SERVERS.NET. 172800 IN A 192.42.93.30
H.GTLD-SERVERS.NET. 172800 IN A 192.54.112.30
I.GTLD-SERVERS.NET. 172800 IN A 192.43.172.30
J.GTLD-SERVERS.NET. 172800 IN A 192.48.79.30
K.GTLD-SERVERS.NET. 172800 IN A 192.52.178.30
L.GTLD-SERVERS.NET. 172800 IN A 192.41.162.30
;; Query time: 240 msec
;; SERVER: 198.41.0.4#53(198.41.0.4)
;; WHEN: Sat Aug 23 04:15:28 2008
;; MSG SIZE rcvd: 508
root@www [~]# dig @a.gtld-servers.net castledragmire.com
; <<>> DiG 9.2.4 <<>> @a.gtld-servers.net castledragmire.com
; (2 servers found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35586
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 2
;; QUESTION SECTION:
;castledragmire.com. IN A
;; AUTHORITY SECTION:
castledragmire.com. 172800 IN NS ns3.deltaarc.com.
castledragmire.com. 172800 IN NS ns4.deltaarc.com.
;; ADDITIONAL SECTION:
ns3.deltaarc.com. 172800 IN A 216.127.92.71
ns4.deltaarc.com. 172800 IN A 209.85.115.181
;; Query time: 58 msec
;; SERVER: 192.5.6.30#53(192.5.6.30)
;; WHEN: Sat Aug 23 04:15:42 2008
;; MSG SIZE rcvd: 113
root@www [~]# dig @ns3.deltaarc.com castledragmire.com
; <<>> DiG 9.2.4 <<>> @ns3.deltaarc.com castledragmire.com
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26198
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0
;; QUESTION SECTION:
;castledragmire.com. IN A
;; ANSWER SECTION:
castledragmire.com. 14400 IN A 209.85.115.128
;; AUTHORITY SECTION:
castledragmire.com. 14400 IN NS ns4.deltaarc.com.
castledragmire.com. 14400 IN NS ns3.deltaarc.com.
;; Query time: 1 msec
;; SERVER: 216.127.92.71#53(216.127.92.71)
;; WHEN: Sat Aug 23 04:15:52 2008
;; MSG SIZE rcvd: 97
Linux also has the “host” command, but I prefer and recommend “dig”.
And that’s how you diagnose DNS problems! :-). For reference, two common DNS configuration problems are not having your SOA and NS records properly set for the domain on your name server.
I’m gonna cheat today since it is really late, as I spent a good amount of time organizing the 3D Engines update which pushed me a bit behind, and I’m also exhausted. Instead of writing some more content, I’m just linking to the “Utilized Optimization Techniques” section of the 3D Engines project, which I put up today.
It describes 4 programming speed optimization tricks: Local variable assignment, precalculating index lookups, pointer transversing/addition, and loop unrolling. This project post also goes into some differences between the used languages [Flash, C++, and Java], especially when dealing with speed.
I am often asked to transfer data sets into MySQL databases, or other formats. In this case, I’ll use a Microsoft Excel file without line breaks in the fields to MySQL as an example. While there are many programs out there to do this kind of thing, this method doesn’t take too long and is a good example use of regular expressions.
First, select all the data in Excel (ctrl+a) and copy (ctrl+c) it to a text editor with regular expression support. I recommend EditPad Pro as a very versatile and powerful text editor.
Next, we need to turn each row into the format “('FIELD1','FIELD2','FIELD3',...),”. Four regular expressions are needed to format the data:
Search
Replace
Explanation
'
\\'
Escape single quotes
\t
','
Separate fields and quote as strings
^
('
Start of row
$
'),
End of row
From there, there are only 2 more steps to complete the query.
Add the start of the query: “INSERT INTO TABLENAME VALUES”
End the query by changing the last row's comma “,” at the very end of the line to a semi-colon “;”.
For example:
a b c
d e f
g h i
would be converted to
INSERT INTO MyTable VALUES
('a','b','c'),
('d','e','f'),
('h','h','i');
Sometimes queries may get too long and you will need to separate them by performing the “2 more steps to complete the query” from above.
After doing one of these conversions recently, I was also asked to make the data searchable, so I made a very simple PHP script for this.
This script lets you search through all the fields and lists all matches. The fields are listed on the 2nd line in an array as "SQL_FieldName"=>"Viewable Name". If the “Viewable Name” contains a pound sign “#” it is matched exactly, otherwise, only part of the search string needs to be found.
<?
$Fields=Array('ClientNumber'=>'Client #', 'FirstName'=>'First Name', 'LastName'=>'Last Name', ...); //Field list
print '<form method=post action=index.php><table>'; //Form action needs to point to the current file
foreach($Fields as $Name => $Value) //Output search text boxes
print "<tr><td>$Value</td><td><input name=\"$Name\" style='width:200px;' value=\"".
(isset($_POST[$Name]) ? htmlentities($_POST[$Name], ENT_QUOTES) : '').'"></td></tr>';//Text boxes w/ POSTed values,if set
print '</table><input type=submit value=Search></form>';
if(!isset($_POST[key($Fields)])) //If search data has not been POSTed, stop here
return;
$SearchArray=Array('1=1'); //Search parameters are stored here. 1=1 is passed in case no POSTed search parameter are ...
//... requested so there is at least 1 WHERE parameter, and is optimized out with the MySQL preprocessor anyways.
foreach($Fields as $Name => $Value) //Check each POSTed search parameter
if(trim($_POST[$Name])!='') //If the POSTed search parameter is empty, do not use it as a search parameter
{
$V=mysql_escape_string($_POST[$Name]); //Prepare for SQL insertion
$SearchArray[]=$Name.(strpos($Value, '#')===FALSE ? " LIKE '%$V%'" : "='$V'"); //Pound sign in the Viewable Name=exact ...
//... value, otherwise, just a partial patch
}
//Get data from MySQL
mysql_connect('SQL_HOST', 'SQL_USERNAME', 'SQL_PASSWORD');
mysql_select_db('SQL_DATABASE');
$q=mysql_query('SELECT * FROM TABLENAME WHERE '.implode(' AND ', $SearchArray));
//Output retrieved data
$i=0;
while($d=mysql_fetch_assoc($q)) //Iterate through found rows
{
if(!($i++)) //If this is the first row found, output header
{
print '<table border=1 cellpadding=0 cellspacing=0><tr><td>Num</td>'; //Start table and output first column header (row #)
foreach($Fields as $Name => $Value) //Output the rest of the column headers (Viewable Names)
print "<td>$Value</td>";
print '</tr>'; //Finish header row
}
print '<tr bgcolor='.($i&1 ? 'white' : 'gray')."><td>$i</td>"; //Start the data field's row. Row's colors are alternating white and gray.
foreach($Fields as $Name => $Value) //Output row data
print '<td>'.$d[$Name].'</td>';
print '</tr>'; //End data row
}
print ($i==0 ? 'No records found.' : '</table>'); //If no records are found, output an error message, otherwise, end the data table
?>
The unfortunate reality of different feature sets in different language implementations
I was thinking earlier today how it would be neat for C/C++ to be able to get the address of a jump-to label to be used in jump tables, specifically, for an emulator. A number of seconds after I did a Google query, I found out it is possible in gcc (the open source native Linux compiler) through the “label value operator” “&&”. I am crushed that MSVC doesn’t have native support for such a concept :-(.
The reason it would be great for an emulator is for emulating the CPU, in which, usually, each first byte of a CPU instruction’s opcode [see ASM] gives what the instruction is supposed to do. An example to explain the usefulness of a jump table is as follows:
Of course, this could still be done with virtual functions, function pointers, or a switch statement, but those are theoretically much slower. Having them in separate functions would also remove the possibility of local variables.
Although, again, theoretically, it wouldn’t be too bad to use, I believe, the _fastcall function calling convention with function pointers, and modern compilers SHOULD translate switches to jump tables in an instance like this, but modern compilers are so obfuscated you never know what they are really doing.
It would probably be best to try and code such an instance so that all 3 methods (function pointers, switch statement, jump table) could be utilized through compiler definitions, and then profile for whichever method is fastest and supported.
//Define the switch for which type of opcode picker we want
#define UseSwitchStatement
//#define UseJumpTable
//#define UseFunctionPointers
//Defines for how each opcode picker acts
#if defined(UseSwitchStatement)
#define OPCODE(o) case OP_##o:
#elif defined(UseJumpTable)
#define OPCODE(o) o:
#define GET_OPCODE(o) &&o
#elif defined(UseFunctionPointers)
#define OPCODE(o) void Opcode_##o()
#define GET_OPCODE(o) (void*)&Opcode_##o
//The above GET_OPCODE is actually a problem since the opcode functions aren't listed until after their ...
//address is requested, but there are a couple of ways around that I'm not going to worry about going into here.
#endif
enum {OP_ADD=0, OP_SUB}; //assuming ADD=opcode 0 and so forth
void DoOpcode(int OpcodeNumber, ...)
{
#ifndef UseSwitchStatement //If using JumpTable or FunctionPointers we need an array of the opcode jump locations
void *Opcodes[]={GET_OPCODE(ADD), GET_OPCODE(SUB)}; //assuming ADD=opcode 0 and so forth
#endif
#if defined(UseSwitchStatement)
switch(OpcodeNumber) { //Normal switch statement
#elif defined(UseJumpTable)
goto *Opcodes[OpcodeNumber]; //Jump to the proper label
#elif defined(UseFunctionPointers)
*(void(*)(void))Opcodes[OpcodeNumber]; //Jump to the proper function
} //End the current function
#endif
//For testing under "UseFunctionPointers" (see GET_OPCODE comment under "defined(UseFunctionPointers)")
//put the following OPCODE sections directly above this "DoOpcode" function
OPCODE(ADD)
{
//...
}
OPCODE(SUB)
{
//...
}
#ifdef UseSwitchStatement //End the switch statement
}
#endif
#ifndef UseFunctionPointers //End the function
}
#endif
After some tinkering, I did discover through assembly insertion it was possible to retrieve the offset of a label in MSVC, so with some more tinkering, it could be utilized, though it might be a bit messy.
A friend just asked me to write a PHP function to list all the contents of a directory and its sub-directories.
Nothing special here... just a simple example piece of code and boredom...
It wouldn’t be a bad idea to turn off PHP’s “output buffering” and on “implicit flush” when running something like this for larger directories. Example output for “ListContents('c:\\temp');”:
A.BMP [230]
Dir1 [D]
codeblocks-1.0rc2_mingw.exe [13,597,181]
Dir1a [D]
DEBUGUI.C [25,546]
Dir2 [D]
Dir3 [D]
HW.C [12,009]
INIFILE.C [9,436]
NTDETECT.COM [47,564]
I decided to make it a little nicer afterwards by bolding the directories, adding their total size, and changing sizes to a human readable format. This function is a lot more memory intensive because it holds data in strings instead of immediately outputting.
function HumanReadableSize($Size)
{
$MetricSizes=Array('Bytes', 'KB', 'MB', 'GB', 'TB');
for($SizeOn=0;$Size>=1024 && $SizeOn<count($MetricSizes)-1;$SizeOn++) //Loops until Size is < a binary thousand (1,024) or we have run out of listed Metric Sizes
$Size/=1024;
return preg_replace('/\\.?0+$/', '', number_format($Size, 2, '.', ',')).' '.$MetricSizes[$SizeOn]; //Forces to a maximum of 2 decimal places, adds comma at thousands place, appends metric size
}
function ListContents2($DirName, &$RetSize)
{
$Output='<ul>';
$dir=opendir($DirName);
$TotalSize=0;
while($file=readdir($dir))
if($file!='.' && $file!='..')
{
$FilePath="$DirName/$file";
if(is_dir($FilePath)) //Is directory
{
$DirContents=ListContents2($FilePath, $DirSize);
$Output.="<li><b>$file</b> [".HumanReadableSize($DirSize)."]$DirContents</li>";
$TotalSize+=$DirSize;
}
else //Is file
{
$FileSize=filesize($FilePath);
$Output.="<li>$file [".HumanReadableSize($FileSize).']</li>';
$TotalSize+=$FileSize;
}
}
closedir($dir);
$RetSize=$TotalSize;
$Output.='</ul>';
return $Output;
}
Example output for “print ListContents2('c:\\temp', $Dummy);”:
A.BMP [230 Bytes]
Dir1 [12.99 MB]
codeblocks-1.0rc2_mingw.exe [12.97 MB]
Dir1a [24.95 KB]
DEBUGUI.C [24.95 KB]
Dir2 [0 Bytes]
Dir3 [20.94 KB]
HW.C [11.73 KB]
INIFILE.C [9.21 KB]
NTDETECT.COM [46.45 KB]
The memory problem can be rectified through a little extra IO by calculating the size of a directory before its contents is listed, thereby not needing to keep everything in a string.
Of course, after all this, my friend took the original advice I gave him before writing any of this code, which was that using bash commands might get him to his original goal much easier.
First, to find out more about any bash command, use
man COMMAND
Now, a primer on the three most useful bash commands: (IMO) find:
Find will search through a directory and its subdirectories for objects (files, directories, links, etc) satisfying its parameters.
Parameters are written like a math query, with parenthesis for order of operations (make sure to escape them with a “\”!), -a for boolean “and”, -o for boolean “or”, and ! for “not”. If neither -a or -o is specified, -a is assumed.
For example, to find all files that contain “conf” but do not contain “.bak” as the extension, OR are greater than 5MB:
-maxdepth & -mindepth: only look through certain levels of subdirectories
-name: name of the object (-iname for case insensitive)
-regex: name of object matches regular expression
-size: size of object
-type: type of object (block special, character special, directory, named pipe, regular file, symbolic link, socket, etc)
-user & -group: object is owned by user/group
-exec: exec a command on found objects
-print0: output each object separated by a null terminator (great so other programs don’t get confused from white space characters)
-printf: output specified information on each found object (see man file)
For any number operations, use:
+n
for greater than n
-n
for less than n
n
for exactly than n
For a complete reference, see your find’s man page.
xargs:
xargs passes piped arguments to another command as trailing arguments.
For example, to list information on all files in a directory greater than 1MB: (Note this will not work with paths with spaces in them, use “find -print0” and “xargs -0” to fix this)
find -size +1024k | xargs ls -l
Some useful parameters include:
-0: piped arguments are separated by null terminators
-n: max arguments passed to each command
-i: replaces “{}” with the piped argument(s)
So, for example, if you had 2 mirrored directories, and wanted to sync their modification timestamps:
GREP is used to search through data for plain text, regular expression, or other pattern matches. You can use it to search through both pipes and files.
For example, to get your number of CPUs and their speeds:
cat /proc/cpuinfo | grep MHz
Some useful parameters include:
-E: use extended regular expressions
-P: use perl regular expression
-l: output files with at least one match (-L for no matches)
-o: show only the matching part of the line
-r: recursively search through directories
-v: invert to only output non-matching lines
-Z: separates matches with null terminator
So, for example, to list all files under your current directory that contain “foo1”, “foo2”, or “bar”, you would use:
grep -rlE "foo(1|2)|bar"
For a complete reference, see your grep’s man page.
And now some useful commands and scripts: List size of subdirectories:
du --max-depth=1
The --max-depth parameter specifies how many sub levels to list.
-h can be added for more human readable sizes.
List number of files in each subdirectory*:
#!/bin/bash
export IFS=$'\n' #Forces only newlines to be considered argument separators
for dir in `find -type d -maxdepth 1`
do
a=`find $dir -type f | wc -l`;
if [ $a != "0" ]
then
echo $dir $a
fi
done
and to sort those results
SCRIPTNAME | sort -n -k2
List number of different file extensions in current directory and subdirectories:
If you want to make pre-edit backups, include an extension after “-i” like “-i.orig”
Perform operations in directories with too many files to pass as arguments: (in this example, remove all files from a directory 100 at a time instead of using “rm -f *”)
find -type f | xargs -n100 rm -f
Force kill all processes containing a string:
killall -9 STRING
Transfer MySQL databases between servers: (Works in Windows too)
Some lesser known commands that are useful: screen: This opens up a virtual console session that can be disconnected and reconnected from without stopping the session. This is great when connecting to console through SSH so you don’t lose your progress if disconnected. htop: An updated version of top, which is a process information viewer. iotop: A process I/O (input/output - hard drive access) information viewer. Requires Python ? 2.5 and I/O accounting support compiled into the Linux kernel. dig: Domain information retrieval. See “Diagnosing DNS Problems” Post for more information.
More to come later...
*Anything staring with “#!/bin/bash” is intended to be put into a script.
A few days ago I threw together a script for a friend in GreaseMonkey (a FireFox extension) that removes the side banner from Demonoid. It was as follows (JavaScript).
var O1=document.getElementById('navtower').parentNode;
O1.parentNode.removeChild(O1);
This simple snippet is a useful example that is used for a lot of webpage operations. Most web page scripting just involves finding objects and then manipulating them and their parent objects. There are two common ways to get the reference to objects on a web page. One is document.getElementById, and another is through form objects in the DOM.
With the first getElementById, you can get any object by passing it’s id tag, for example,
<div id=example>
<script language=JavaScript>
var MyObject=document.getElementById('example');
</script>
This function is used so often, many frameworks also abbreviate it with a function:
function GE(Name) { return document.getElementById(Name); }
I know of at least one framework that actually names the function as just a dollar sign $.
The second way is through the name tag on objects, which both the form and any of its form elements require. Only form elements like input, textarea, and select can use this.
<body>
<form name=MyForm>
<input type=text name=ExampleText value=Example>
</form>
<script language=JavaScript>
document.MyForm.ExampleText.value='New Example'; //Must use format document.FormName.ObjectName
</script>
</body>
This is the very basis of all JavaScript/web page (client side only) programming. The rest is just learning all the types of objects with their functions and properties.
So, anyways, yesterday, Demonoid changed their page so it no longer worked. All that needed to be done was change the 'navtower' to 'smn' because they renamed the object (and made it an IFrame). This kind of information is very easy to find and edit using a very nice and useful FireFox extension called FireBug. I have been using this for a while to develop web pages and do editing (for both designing and JavaScript coding) and highly recommend it.