Off site backups

This long weekends task has been to set up some form of off site backup for my dads business. He has a relatively small set of data (total less than 2gb) that is not going to change a huge amount but it is going to continue to grow over time. Previously dad had been cutting backups to DVDs every night, but as well as being wasteful by generating large numbers of half full DVDs it was slow and required somebody to remember to put a fresh disk in the drive at the end of every day.

Dad has a couple of different machines that need backing up,

  • A ancient Novell 4.x machine running some legacy software
  • A new Windows 2008 Server running a database
  • A small Ubuntu Linux box acting as NAT gateway and mail server

So the plan was to come up with something that would meet the following criteria

  • Automatic
  • Simple
  • Support very varied sources

I looked at a couple of the companies offering hosted backup online but decided to roll my own due to not wanting to have to worry about a monthly bill and knowing that if something went wrong and the backup was actually needed then it would be quicker to be able to carry whatever held the back up to the office and restore from that rather than to wait for it all to download again.

Since there was such a big difference in the type of machine that needed backing up and that the Novell stuff is so out of date that nobody was likely to support it any more I decided the best bet was to mount the bits that needed backing up from the Windows and Novell machines on the Linux box and then back it up from there. Accessing the Windows share from was easy enough using samba and the CIFS filesystem mount options. Getting at the Novell mount was a little harder.

The old NetWare Novell server doesn’t talk IP like everything else on the network, it uses IPX instead. So the first task was to configure the Linux machine to join the IPX network. Detailed instructions for setting up IPX can be found here, but the basics are assuming your Linux distribution has IPX compiled in you should be able to just run the following command as root.

ipx_configure --auto_interface=on --auto_primary=on

Once the network is up and running the next job was to mount the share, for this I needed the NCPFS package. This package is very similar to Samba and allows you to search for and mount Novell NetWare shares. There are 2 useful commands, the first is slist which will list all the Novell servers that can be seen on the network. The second is ncpmount which will mount the share.

ncpmount -S <server> <mount point> -U <Novell user>

Now I had access to all the data I needed somewhere to send it. While dad has a desktop and a laptop at home neither is on all the time or have large amounts of spare disk space, so I needed to find something to store the backups on. I was looking for a low power NAS with rsync support. I had a hunt round and found several, but most of them where big multi disk systems intended for mid sized office use and cost several hundred pounds. A bit more searching turned up the Buffalo LinkStation Live 500GB, which is basically a 500gb disk with a mini ARM based Linux machine strapped to the side. There is a reasonable sized community online hacking these little boxes to do all sorts and there are detailed instructions on how to enable telnet and SSH access which in turn can allow rsync to be used. Peek power draw is about 20W and significantly less when idle so it’s not going to cost much to run.

So I had send dad off to order one of these LinikStations and it was waiting for me when I got home. I initially started off following this set of instructions but had no joy. It turns out with the latest version of the firmware (1.24 at time of writing) the APC commander application no longer works. I started to search round some of the forums and came up with this new set of instructions on how to patch the firmware to add SSH and allow console access. The instructions are good, but it still took 2 attempts to get the firmware to upload, but once it did it all worked like a dream.

At this point I had access to all the data that needed backing up on the Linux box at the office and the LinkStation was ready to go at home, it was time to set up a way to get the data from one to the other. Rsync is perfect for this as it allows you to only copy the differences between 2 directories, these directories don’t even have to be on the same machine and all the communication can be tunnelled over SSH. The basic starting point for rsync over SSH is:

rsync -az -e ssh /path/to/src user@machine:/path/to/destination

Lets just step through the options being used here.

  • -a – This tells rsync to “archive” the src directory, this basically means conserve all the file access permissions, timestamps, symlinks and recurse into directories
  • -z – This tells rsync to compress all the data passed back and forth to save bandwidth
  • -e ssh – This tells rysnc to use ssh to connect to the target machine

That’s ok to start with as it will copy anything new in the source directory to the destination directory but it will not delete things from the destination if they disappear from the source directory, this is fine for this situation since dad is not likely to be deleting anything on the src side. So I wrapped it up in a script to run as a cron job after everybody should have gone home for the night


rsync -az -e ssh $SRC $USER@$SERVER:$DST

The following line in the crontab runs the backup at 5 past midnight on Tuesday through Saturday so to pick up the changes for the working days.

5 0 * * 2-6 /usr/local/backup

At the moment every time the script runs it will ask for the password for the SSH connection to the Linkstation at home. To get round this I set up an SSH key. There are lots of examples for creating SSH keys and using online so I’ll skip past that now.

So that is nearly that, except having set this all up I started to think about how to recover should a problem happen. What happens if a important file gets trashed at some point and nobody notices for a day or two. If the current script is running every day at midnight then the safe copy of the trashed file will get overwritten as soon as the script runs on the day it was damaged. To get round this I needed some sort of rolling incremental backup. Luckily rsync has this support as well and there is an example on the rsync page of how to set it up. I’ve modified it a little bit to give me this:

INCDIR=`date --date=yesterday +%A`
OPTS="--backup --backup-dir=$DSTDIR/$INCDIR -az -e ssh"

# the following line clears the last weeks incremental directory
[ -d $HOME/emptydir ] || mkdir $HOME/emptydir
rsync --delete -a $HOME/emptydir/ $USER@$SERVER:/$DSTDIR/$INCDIR/
rmdir $HOME/emptydir


This will create a directory called “current” which will hold the most up to date version backed up, it will also create 7 other directories named after the day of the week holding the previous version of any files that changed that day. This should allow any changes in the last week to be reverted.

In order to prevent having to copy all the initial 2gb over the dads broadband I took the LinkStation to the office and set it up on the local network then ran the backup script just with the local IP address rather than the SERVER entry.

For the normal operations I also had to set up dads router at home to assign the LinkStation a static IP address and forward port 22 to it so it was visible to the outside world.