Home Page
Archive > Posts > Tags > FileSync
Search:

Automatically resuming rsync
The old network file copy problem

Rsync is a spectacular bash utility for doing file copying and/or syncing operations. It has a multitude of switches to help optimize and handle any requirements for file copy operation over a local computer or network. However, sometimes networks are less than stable and stalls can happen during an rsync (or scp). This is quite the nuisance when doing very large (i.e. GB) transfers. To solve this, the following script can be used to auto resume a stalled rsync.


export Result=1;
while [ $Result -ne 0 ]; do
  echo "STARTING ($Result) @" `date`;
  rsync -Pza --timeout=10 COPY_FROM COPY_TO_USER@COPY_TO_HOST:COPY_TO_LOCATION;
  Result=$?;
  sleep 1;
done

  • The -P switch is highly suggested as it activates:
    • --partial: This keeps a file even if it doesn’t finish transferring so it can be resumed when rsync is restarted. This is especially important if you have very large files.
    • --progress: This shows you the progress of the current file copy operation.
  • The -z switch turns on gzip compression during the file transfer, so may only be useful depending on the circumstances.
  • The -a switch stands for “archive” and is generally a good idea to use. It includes the switches:
    • -r: Recurse into folders
    • -t: Preserves file modification time stamp. This is highly recommended for this, and incremental backups, as rsync, by default, skips files whose file sizes and modification times match.
    • -l and -D: Preserve file type (i.e. symlinks)
    • -p: Preserve file (chmod) permissions
    • -g and -o: Preserve file owners.
  • The --timeout is the crux of the script, in that if an I/O timeout of 10 seconds occurs, the rsync exits prematurely so it can be restarted.

For more useful switches and information, see the rsync man page.



Script with comments:
export Result=1; #This will hold the result of the rsync. Set to 1 so the first loop check will fail.
while [ $Result -ne 0 ]; do #Loop until rsync result is successful
  echo "STARTING ($Result) @" `date`; #Inform the user of the time an rsync is starting and the last rsync failure code
  rsync -Pza --timeout=10 COPY_FROM COPY_TO_USER@COPY_TO_HOST:COPY_TO_LOCATION; #See rest of post for switch information
  Result=$?; #Store the result of the rsync
  sleep 1; #This is an optional 1 second timeout between attempts
done
Directory Difference in Web Browser
Another quick and dirty solution

Since my FileSync Project is still a long way to being where I want it to be and is in a state that makes it annoying to use, I decided to throw together a script that essentially emulates the primary functions of it for what I need. I find it a bit annoying that I’ve never been able to find another good project that does exactly what I want for quick syncing of files over networks :-\. I used to think rsync would be a good solution for it but it’s very... quirky and unstable in certain ways. Not as flexible as I would like. Alas.

Anywho, this PHP script takes 2 file lists and gives you back their differences in a directory tree view. Each file has a data string after it used to tell if the files are different. The data string can be anything you want, but will usually probably be a timestamp or data checksum. The tree view lets you hide files/directories depending on whether each item is only on one side, different, or the same. An example is as follows (List1, List2):

Dir1.txt filesData String (Timestamps)
Same.txt9999-99-99 99:99:99
Diff.txt9999-99-99 99:99:99
LeftOnly.txt9999-99-99 99:99:99
PartialDir/Same.txt9999-99-99 99:99:99
PartialDir/Diff.txt9999-99-99 99:99:99
PartialDir/LeftOnly.txt9999-99-99 99:99:99
SameDir/Same.txt9999-99-99 99:99:99
LeftOnlyDir/Left.txt9999-99-99 99:99:99
LeftOnlyDir/Left2.txt9999-99-99 99:99:99
DiffDir/Diff.txt9999-99-99 99:99:99
Dir2.txt filesData String (Timestamps)
Same.txt9999-99-99 99:99:99
Diff.txt1111-11-11 11:11:11
RightOnly.txt9999-99-99 99:99:99
PartialDir/Same.txt9999-99-99 99:99:99
PartialDir/Diff.txt1111-11-11 11:11:11
PartialDir/RightOnly.txt9999-99-99 99:99:99
SameDir/Same.txt9999-99-99 99:99:99
RightOnlyDir/Right.txt9999-99-99 99:99:99
  
DiffDir/Diff.txt1111-11-11 11:11:11


Example Output:
Legend:
  • Differences:
    • No differences detected
    • File is different, or directory contains differences
    • File is only found on left side, when this happens to a directory, it is only counted as 1 difference
    • File is only found on right side, when this happens to a directory, it is only counted as 1 difference
  • Type:
    • This is a directory with sub items. After the directory name, it lists as info the number of differences over the total number of sub items
    • This is a file. If both sides contain the file but are different, it lists as info the different strings
  • Info about the file/directory listed in between parenthesis.
Options:
Left Side: ./Dir1.txt
Right Side: ./Dir2.txt
Total Differences: 9
Total Items: 13
  • DiffDir (1/1)
    • Diff.txt (9999-99-99 99:99:99 :: 1111-11-11 11:11:11)
  • LeftOnlyDir (1/2)
    • Left.txt
    • Left2.txt
  • PartialDir (3/4)
    • Diff.txt (9999-99-99 99:99:99 :: 1111-11-11 11:11:11)
    • LeftOnly.txt
    • RightOnly.txt
    • Same.txt
  • RightOnlyDir (1/1)
    • Right.txt
  • SameDir (0/1)
    • Same.txt
  • Diff.txt (9999-99-99 99:99:99 :: 1111-11-11 11:11:11)
  • LeftOnly.txt
  • RightOnly.txt
  • Same.txt

Some example bash commands used to create file lists:
  • Output all files and their timestamps to “Dir1.txt”: find -type f -printf '%P\t%T+\n' > Dir1.txt
  • Output all files and their md5sums to “Dir1.txt”: find -type f -print0 | xargs -0 md5sum | perl -pe 's/^(.*?) (.*)$/$2\t$1/g' > Dir1.txt