1. Introduction
Taking backups is important if you don't want to lose your work, digital photos etc. I propose a simple system that makes it easy to regularly take backups on one or more USB or eSATA external disks.
Keep in mind that a working backup should obey a few rules:
- Make backups regularly
- Keep your backups 'off line'
- Keep a backup on a different location
- Tested - verify that you can read or restore the backup
2. Linux related technology
2.1. Hard Links
A hard link is a second name for the same file. Think of it as two names for the same instance of a file.
Let us look at an example.
:$ echo a1 >> a.txt :$ ln a.txt b.txt :$ echo b1 >> b.txt :$ echo a2 >> a.txt
The first command creates a file named 'a.txt' and appends 'a1' to the (empty) content. The second command creates a hard link, that is an alternative name for the same file. Now we append 'a2' to the file named 'a.txt' and 'b' to the file named 'b.txt'.
Now, as it happens both names point to the same file, so if you open either of them, this is what you see:
:$ more a.txt a1 b1 a2
As you can see, changing a.txt affects b.txt because it is the same file. We use hard links because we don't have to keep several identical duplicates of the same file.
2.2. Copying with rsync
The rsync tool is intended for copying files between two hosts in an efficient manner. It only sends data for files that have changed or are not yet present on the destination host.
We use rsync primarily because it is capable of
- --delete causes rsync to delete files on the destination that are not present on the source
- --link-dest $latest_link instructs to create hard links to files present in $latest_link instead of making a copy. This is what actually does the magic...
- -a preserves most file properties like permissions, timestamps etc.
- -H preserves hard links between two files on the source when copying to the destination.
3. Preparing the external disk
Before using the backup script, you need to have a file system on the backup disks that support hard links and soft links. Most USB disks are sold formatted with FAT32 - which is not sufficient for our purposes.
You can achieve this with the command line tool parted , or with the graphical frontend for it, GParted.
The script checks for a plugged in disk with a file system of type ext3 , so you need to format the disk as such or adapt the script if you want to use different file systems like ext4 or reiser4 .
4. The Script
The script contains a few variables that you can adapt as you see fit.
The script works by looking at the dynamically mounted disks. In Ubuntu this occurs at /media . When using with other Linux distributions, you might have to verify that.
Since the base directory is /home you need to run the script as super user. If you don't like that, just set the data_dir to your home directory.
An advantage of using this script is that file permissions are preserved too.
Since each file occurs only once: NEVER edit files directly on the backup disk. Copy them first to your normal hard disk.
Here is the script. Copy it to a file backup.sh and make it executable.
#!/bin/bash
############################################################################
#
# Adapt this script for your situation, in particular 'data_dir' and
# 'subdir'
#
# Test first with some simple testdata!!!
#
# Removing the 'latest' link will trigger a full backup the next time, using the same
# space as a complete copy. Incremental backups take far less space
#
# Manually triggered backup
# Target should be a (USB) disk mounted somewhere under "/media" with ext3 filesystem.
# The first disk found that does not start with cdrom or floppy, is used
#
# Backup uses hardlinks. Remove the 'latest' symbolic link if you want a new copy (full backup)
# With hardlinks, each file exists only once on the disk.
#
####################################################################################
data_dir="/home"
hostname="$(hostname -s)"
date_dir=$(date +%F-%k%M%S)
subdir="backup-incr"
#We only want ext3 filesystem supporting hardlinks
LIST=$(find -L /media -maxdepth 1 -type d -fstype ext3 -regex '/media/.*')
#Filter out cdrom and floppy, they are always present and not that relevant...
((count=0))
for name in $LIST
do
if [[ $name != /media/cdrom* && $name != /media/floppy* ]]
then
FILTERED_LIST[ ((count++)) ]=$name
fi
done
#Check we have at least one backup medium
if [ $FILTERED_LIST ]
then
echo -e '\nAvailable media'
PS3='Choose the backup media number (or type any other number to quit):'
select name in ${FILTERED_LIST[*]}
do
usbdisk=$name
break
done
echo -e '\nyour choice: $usbdisk'
if [ $usbdisk ]
then
backup_dir=$usbdisk/$subdir/$hostname/$date_dir
latest_link=$usbdisk/$subdir/$hostname/latest
echo
echo '-----------------------------------------'
echo ' Backup:'
echo ' from:' $data_dir
echo ' to:' $backup_dir
echo ' latest:' $latest_link
echo '-----------------------------------------'
echo
if test ! -d $data_dir
then
echo "ERROR: Data directory does not exist, stopping now"
else
echo "Data directory exists"
mkdir -p $backup_dir
if [ -h $latest_link ]
then
echo "Doing incremental backup"
cp -al $latest_link $backup_dir
rsync -aH --delete --link-dest $latest_link $data_dir/ $backup_dir
rm $latest_link
else
echo "Doing full backup"
rsync -a --delete $data_dir/ $backup_dir
fi
ln -s $backup_dir $latest_link
echo -e '\n\tbackup ended' $hostname/$date_dir 'at' $(date +%F-%k%M%S)
fi
else
echo "No backup disk selected"
fi
else
echo "No backup disk detected in /media (type ext3 - symbolic link allowed)"
fi