Home Page
Archive > Posts > Tags > NTFS
Search:

Weird filename encoding issues on windows

So somehow all of the file names in my Rammstein music directory, and some in my Daft Punk, had characters with diacritics replaced with an invalid character. I pasted one of such filenames into a hex editor to evaluate what the problem was. First, I should note that Windows encodes its filenames (and pretty much everything) in UTF16. Everything else in the world (mostly) has settled on UTF8, which is a much better encoding for many reasons. So during some file copy/conversion at some point in the directories’ lifetime, the file names had done a freakish (utf16*)(utf16->utf8) rename, or something to that extent. I had noticed that all I needed to do was to replace the first 2 bytes of the diacritic character with a different byte. Namely “EF 8x” to “Cx”, and the rest of the bytes for the character were fine. So if anyone ever needs it, here is the bash script.

LANG=;
IFS=$'\n'
for i in `find -type f | grep -P '\xEF[\x80-\x8F]'`; do
	FROM="$i";
	TO=$(echo "$i" | perl -pi -e 's/\xEF([\x80-\x8F])/pack("C", ord($1)+(0xC0-0x80))/e');
	echo Renaming "'$FROM'" to "'$TO'"
	mv "$FROM" "$TO"
done

I may need to expand the range beyond the x80-x8F range, but am unsure at this point. I only confirmed the range x82-x83.

Symlinks in a Windows programming environment
Windows will get it right one day

I have been having some problems regarding symlinks (symbolic links) for a project that I’ve been working on recently which is requiring work in at least 5 very different operating systems (and about a dozen programming languages). Not many programs support symlinks properly that I have the need to because support for it wasn’t added for NTFS until Windows Vista, and it still has some problems.

It is really great that Windows Vista and Windows 7 now support native symlinks so they can be utilized by programs out of the box. For example, one such instance where I need this a lot is directory relinking in Apache. While Apache’s mod_alias can duplicate the functionality of symlinks for many needs, creating special cases for this one piece of software when distributing a code repository is just not practical, and having proper symlinks natively followed without the program knowing they aren’t the actual file/directory is really the best solution so everything works without special cases.

The way to create NTFS symlinks in Windows Vista+ is through the “mklink” command, which is unfortunately implemented directly in the Window’s command shell, and not a separate executable, so it is not accessible to Cygwin. Further, Cygwin has made a stance to only support reading NTFS symlinks, and not creating them, because they can only be created by administrators, and require specification as to whether the link’s target is a directory or file. Cygwin itself in Windows has had support for symlinks for a long time, but these are not compatible with any program run outside of the Cygwin environment.

Now, my real problem started occurring when trying to use these NTFS symlinks with GIT. While GIT natively supports symlinks, TortoiseGIT doesn’t really support them at all, and throws errors when they are encountered. This is still a big problem that I am going to have to think about :-\. Fortunately, when working with GIT in Cygwin they still work, with caveats. As previously mentioned, only reading the NTFS symlinks in Cygwin work, so when you fetch/pull from a repository and it creates Cygwin style symlinks, Windows still does not read them properly. The following is a script I wrote to change the Cygwin style symlinks into NTFS style symlinks. It can be run from the root folder of the GIT project.

#!/bin/bash
IFS=$'\n' #Spaces do not count as new delimiters

function makewinlink
{
	LINK=$1
	OPTIONS=$2
	TARGET=`find $LINK -maxdepth 0 -printf %l`
	LASTMODTIME=`find $LINK -maxdepth 0 -printf "%t"`
	LINKDIR=`find $LINK -maxdepth 0 -printf %h`
	TARGET=`echo $LINKDIR/$TARGET`
	rm -f $LINK
	cmd /c mklink $OPTIONS "$(cygpath -wa $LINK)" "$(cygpath -wa $TARGET)"
	touch -h -d $LASTMODTIME $LINK
}

#Relink all directories
FILES=`find -type l -print0 | xargs -0 -i find -L {} -type d -maxdepth 0`
for f in $FILES
do
	makewinlink $f /D
done

#Relink all files
FILES=`find -type l -print0 | xargs -0 -i find -L {} -type f -maxdepth 0`
for f in $FILES
do
	makewinlink $f
done

Make sure when committing symlinks in a GIT repository in Windows to use Cygwin with Cygwin style symlinks instead of TortoiseGIT. Also, as previously mentioned, after running this script, TortoiseGIT will show these symlinks as modified :-\. If this is a problem, you can always reverse the process in Cygwin by changing the “cmd /c mklink $OPTIONS” line to a “ln -s” in the above script (note that “target” and “symlink’s name” need to be switched) along with a few other changes.


[EDIT ON 2011-01-03 @ 6:30am] See here for a better example of symlinking in Windows that uses relative paths. [/EDIT]