RABiD BUNNY FEVER
K.T.K
Warning: you do not have javascript enabled. This WILL cause layout glitches.
|
                
| UTF8 BOM | When a good idea is still considered too much by some |
While UTF-8 has almost universally been accepted as the de-facto standard for Unicode character encoding in most non-Windows systems (mmmmmm Plan 9 ^_^), the BOM (Byte Order Marker) still has large adoption problems. While I have been allowing my text editors to add the UTF8 BOM to the beginning of all my text files for years, I have finally decided to rescind this practice for compatibility reasons.
While the UTF8 BOM is useful so that editors know for sure what the character encoding of a file is, and don’t have to guess, they are not really supported, for their reasons, in Unixland. Having to code solutions around this was becoming cumbersome. Programs like vi and pico/nano seem to ignore a file’s character encoding anyways and adopt the character encoding of the current terminal session.
The main culprit in which I was running into this problem a lot with is PHP. The funny thing about it too was that I had a solution for it working properly in Linux, but not Windows :-).
Web browsers do not expect to receive the BOM marker at the beginning of files, and if they encounter it, may have serious problems. For example, in a certain browser (*cough*IE*cough*) having a BOM on a file will cause the browser to not properly read the DOCTYPE, which can cause all sorts of nasty compatibility issues.
Something in my LAMP setup on my cPanel systems was removing the initial BOM at the beginning of outputted PHP contents, but through some preliminary research I could not find out why this was not occurring in Windows. However, both systems were receiving multiple BOMs at the beginning of the output due to PHP’s include/require functions not stripping the BOM from those included files. My solution to this was a simple overload of these include functions as follows (only required when called from any directly opened [non-included] PHP file):
<?
/*Safe include/require functions that make sure UTF8 BOM is not output
Use like: eval(safe_INCLUDETYPE($INCLUDE_FILE_NAME));
where INCLUDETYPE is one of the following: include, require, include_once, require_once
An eval statement is used to maintain current scope
*/
//The different include type functions
function safe_include($FileName) { return real_safe_include($FileName, 'include'); }
function safe_require($FileName) { return real_safe_include($FileName, 'require'); }
function safe_include_once($FileName) { return real_safe_include($FileName, 'include_once'); }
function safe_require_once($FileName) { return real_safe_include($FileName, 'require_once'); }
//Start the processing and return the eval statement
function real_safe_include($FileName, $IncludeType)
{
ob_start();
return "$IncludeType('".strtr($FileName, Array("\\"=>"\\\\", "'", "\\'"))."'); safe_output_handler();";
}
//Do the actual processing and return the include data
function safe_output_handler()
{
$Output=ob_get_clean();
while(substr($Output, 0, 3)=='?') //Remove all instances of UTF8 BOM at the beginning of the output
$Output=substr($Output, 3);
print $Output;
}
?>
I would have like to have used PHP’s output_handler ini setting to catch even the root file’s BOM and not require include function overloads, but, as php.net puts it “Only built-in functions can be used with this directive. For user defined functions, use ob_start().”.
As a bonus, the following bash command can be used to find all PHP files in the current directory tree with a UTF8 BOM:
grep -rlP "^\xef\xbb\xbf" . | grep -iP "\.php\$" |
I think it was actually much higher than this, but it wouldn’t let me log in to find out! >:-( . Wish I could easily make SSH and everything I do in it have priority over other process... but then again I probably wouldn’t be able to do anything to fix the load when this sometimes happens anyways. *sighs*
I’ll explain more about “load” in an upcoming post. | I am still, very unfortunately, looking into the problem I talked about way back here :-( [not a lot, but it still persists]. This time I decided to try and boot the OS into a “Safe Mode” with nothing running that could hinder performance tests (like hundreds of HTTP and MySQL sessions). Fortunately, my friend whom is a Linux server admin for a tech firm was able to point me in the right direction after researching the topic was proving frustratingly fruitless.
Linux has “runlevels” it can run at, which are listed in “/etc/inittab” as follows:
# Default runlevel. The runlevels used by RHS are:
# 0 - halt (Do NOT set initdefault to this)
# 1 - Single user mode
# 2 - Multiuser, without NFS (The same as 3, if you do not have networking)
# 3 - Full multiuser mode
# 4 - unused
# 5 - X11
# 6 - reboot (Do NOT set initdefault to this)
So I needed to get into “Single user mode” to run the tests, which could be done two ways. Before I tell you how though, it is important to note that if you are trying to do something like this remotely, normal SSH/Telnet will not be accessible, so you will need either physical access to the computer, or something like a serial console connection, which can be routed through networks.
So the two ways are:
- Through the “init” command. Running “init #” at the console, where # is the runlevel number, will bring you into that runlevel. However, this might not kill all currently unneeded running processes when going to a lower level, but it should get the majority of them, I believe.
- Append “s” (for single user mode) to the grub configuration file (/boot/grub/grub.conf on my system) at the end of the line starting with “kernel”, then reboot. I am told appending a runlevel number may also work.
| If you ever find a file named “core.#” when running Linux, where # is replaced by a number, it means something crashed at some point. Most of the time, you will probably just want to delete the file, but sometimes you may wonder what crashed. To do this, you use gdb (The GNU debugger), a very power tool, to analyze the core dump file.
gdb --core=COREFILENAME
Near the very bottom of the blob of outputted text after running this command, you should see a line that says “Core was generated by `...'.”. This tells you the command line of what crashed. To exit gdb, enter “quit”. You can also use gdb to find out what actually happened and troubleshoot/debug the problem, but that’s a very long and complex topic.
Recently, I started seeing hundreds of core dump files taking up gigabytes of space showing up in “/usr/local/cpanel/whostmgr/docroot/” on multiple of our web servers. According to several online sources, it seems cPanel (web hosting made easy!) likes to dump many, if not all, of its programs' core files into this directory. In our case, it has been “dnsadmin” doing the crashing. We’ve been having some pretty major DNS problems lately, this kind on the name server level, so I may have to rebuild our DNS cluster in the next few days. Joy. | First, to find out more about any bash command, use
man COMMAND
Now, a primer on the three most useful bash commands: ( IMO)
find:
Find will search through a directory and its subdirectories for objects (files, directories, links, etc) satisfying its parameters.
Parameters are written like a math query, with parenthesis for order of operations (make sure to escape them with a “\”!), -a for boolean “and”, -o for boolean “or”, and ! for “not”. If neither -a or -o is specified, -a is assumed.
For example, to find all files that contain “conf” but do not contain “.bak” as the extension, OR are greater than 5MB:
find -type f \( \( -name "*conf*" ! -name "*.bak" \) -o -size +5120k \)
Some useful parameters include:
- -maxdepth & -mindepth: only look through certain levels of subdirectories
- -name: name of the object (-iname for case insensitive)
- -regex: name of object matches regular expression
- -size: size of object
- -type: type of object (block special, character special, directory, named pipe, regular file, symbolic link, socket, etc)
- -user & -group: object is owned by user/group
- -exec: exec a command on found objects
- -print0: output each object separated by a null terminator (great so other programs don’t get confused from white space characters)
- -printf: output specified information on each found object (see man file)
For any number operations, use:
| +n | | for greater than n |
| -n | for less than n |
| n | for exactly than n |
For a complete reference, see your find’s man page.
xargs:
xargs passes piped arguments to another command as trailing arguments.
For example, to list information on all files in a directory greater than 1MB: (Note this will not work with paths with spaces in them, use “find -print0” and “xargs -0” to fix this)
find -size +1024k | xargs ls -l
Some useful parameters include:
- -0: piped arguments are separated by null terminators
- -n: max arguments passed to each command
- -i: replaces “{}” with the piped argument(s)
So, for example, if you had 2 mirrored directories, and wanted to sync their modification timestamps:
cd /ORIGINAL_DIRECTORY
find -print0 | xargs -0 -i touch -m -r="{}" "/MIRROR_DIRECTORY/{}"
For a complete reference, see your xargs’s man page.
grep:
GREP is used to search through data for plain text, regular expression, or other pattern matches. You can use it to search through both pipes and files.
For example, to get your number of CPUs and their speeds:
cat /proc/cpuinfo | grep MHz
Some useful parameters include:
- -E: use extended regular expressions
- -P: use perl regular expression
- -l: output files with at least one match (-L for no matches)
- -o: show only the matching part of the line
- -r: recursively search through directories
- -v: invert to only output non-matching lines
- -Z: separates matches with null terminator
So, for example, to list all files under your current directory that contain “foo1”, “foo2”, or “bar”, you would use:
grep -rlE "foo(1|2)|bar"
For a complete reference, see your grep’s man page.
And now some useful commands and scripts:
List size of subdirectories:
du --max-depth=1
The --max-depth parameter specifies how many sub levels to list.
-h can be added for more human readable sizes.
List number of files in each subdirectory*:
#!/bin/bash
export IFS=$'\n' #Forces only newlines to be considered argument separators
for dir in `find -type d -maxdepth 1`
do
a=`find $dir -type f | wc -l`;
if [ $a != "0" ]
then
echo $dir $a
fi
done
and to sort those results
SCRIPTNAME | sort -n -k2
List number of different file extensions in current directory and subdirectories:
find -type f | grep -Eo "\.[^\.]+$" | sort | uniq -c | sort -nr
Replace text in file(s):
perl -i -pe 's/search1/replace1/g; s/search2/replace2/g' FILENAMES
If you want to make pre-edit backups, include an extension after “-i” like “-i.orig”
Perform operations in directories with too many files to pass as arguments: (in this example, remove all files from a directory 100 at a time instead of using “rm -f *”)
find -type f | xargs -n100 rm -f
Force kill all processes containing a string:
killall -9 STRING
Transfer MySQL databases between servers: (Works in Windows too)
mysqldump -u LOCAL_USER_NAME -p LOCAL_DATABASE | mysql -u REMOTE_USER_NAME -p -D REMOTE_DATABASE -h REMOTE_SERVER_ADDRESS
“-p” specifies a password is needed
Some lesser known commands that are useful:
screen: This opens up a virtual console session that can be disconnected and reconnected from without stopping the session. This is great when connecting to console through SSH so you don’t lose your progress if disconnected.
htop: An updated version of top, which is a process information viewer.
iotop: A process I/O (input/output - hard drive access) information viewer. Requires Python ? 2.5 and I/O accounting support compiled into the Linux kernel.
dig: Domain information retrieval. See “Diagnosing DNS Problems” Post for more information.
More to come later...
* Anything staring with “#!/bin/bash” is intended to be put into a script. | So I have been having major speed issues with one of our servers. After countless hours of diagnoses, I determined the bottle neck was always I/O (input/output, accessing the hard drive). For example, when running an MD5 hash on a 600MB file load would jump up to 31 with 4 logical CPUs and it would take 5-10 minutes to complete. When performing the same test on the same machine on a second drive it finished within seconds.
Replacing the hard drive itself is a last resort for a live production server, and a friend suggested the drive controller could be the problem, so I confirmed that the drive controller for our server was not on-board (on its own card), and I attempted to convince the company hosting our server of the problem so they would replace the drive controller. I ran my own tests first with an iostat check while doing a read of the main hard drive (cat /etc/sda > /dev/null). This produced steadily worsening results the longer the test went on, and always much worse than our secondary drive. I passed these results on to the hosting company, and they replied that a “badblocks –vv” produced results that showed things looked fine.
So I was about to go run his test to confirm his findings, but decided to check parameters first, as I always like to do before running new Linux commands. Thank Thor I did. The admin had meant to write “badblocks –v” (verbose) and typoed with a double key stroke. The two v’s looked like a w due to the font, and had I ran a “badblocks –w” (write-mode test), I would have wiped out the entire hard drive.
Anyways, the test outputted the same basic results as my iostat test with throughput results very quickly decreasing from a remotely acceptable level to almost nil. Of course, the admin only took the best results of the test, ignoring the rest.
I had them swap out the drive controller anyways, and it hasn’t fixed things, so a hard drive replace will probably be needed soon. This kind of problem would be trivial if I had access to the server and could just test the hardware myself, but that is a price to pay for proper security at a server farm. |
|
|
|