Choosing a new backup solution, Duplicity, rdiff-backup or Rsnapshot

Outdated content
Posted on 30.10.2008 by Kim N. Lesmer.
Currently I am using Mercurial revision control system to version control my programming files and written articles etc. Other important files I manually tar up and encrypt and upload to my backup server using sftp from OpenSSH. I decided that it was time to take a look at a better way to deal with my personal backups.

Update 10.02.2012: Duplicity is still in beta and it still contains some bugs. Before deciding on using Duplicity for any important data check out the bug reports. Also I don't use Mercurial any longer, but have long since switched to Git.

Update 03.12.2008: Duplicity is beta and it still contains some bugs. Don't use Duplicity for any important data.

A new approach

After some consideration I would prefer to use this approach:

  • Use Mercurial version control on my programming files and the stuff I write.
  • Use a backup solution for all other important files.

I want the backup solution to provide an easy way to use SSH in uploading the files.

There is a lot of my files that goes into the Mercurial version control that really doesn't belong there, but I use this solution because it is very convenient. I would like to find another solution that can do incremental backups over ssh possible coupled with encryption. The incremental part is important because I am currently wasting a lot of time backing up files that haven't changed since last backup.

I have looked at following solutions:

  • rdiff-backup
  • Rsnapshot
  • Duplicity

I have ruled out other solutions either because they lack features I want, or because they somehow don't provide what I want in an easy-and-simple-to-use manner.

All three solutions uses the rsync algorithm for bandwidth and space efficiency, and all three solutions provides support for incremental backup which is very important if you don't want to backup the same files again and again.

rdiff-backup and Duplicity resemble each other a lot. Duplicity encrypts the data whereas rdiff-backup creates a mirror on the remote system without any encryption.

One major disadvantage of rdiff-backup besides from the lack of encryption is that it demands that the server also has installed the exact same version of rdiff-backup.

rdiff-backup and Duplicity resembles a version control system in a lot of ways whereas Rsnapshot is closer to a traditional backup solution.

Rsnapshot creates a "virtual look" where it appears that each backup is a full backup. Rsnapshot uses hard links to achieve the "virtual look" of full backups. The disk space required is just a little more than the space of one full backup, plus incrementals. This is an important feature if disk space is an issue.

Rsnapshot doesn't support encryption.

An example of how a backup would look like using Rsnapshot three days in a row:

With Rsnapshot:

12Gbackups/daily.0
592Mbackups/daily.1
352Mbackups/daily.2
13Gbackups

A normal backup using tar:

12Gbackups/daily.0
12Gbackups/daily.1
6.8Gbackups/daily.2
29Gbackups

Like mentioned, Rsnapshot uses hard links to files that haven't changed. So if file "my_picture.png" goes into the first daily backup a hard link is created in the second and third daily backup pointing to the original file from the first backup if the file haven't been changed.

When you use Rsnapshot, each backup you take will appear as one full backup of its own.

rdiff-backup and Duplicity uses an approach similar to revision control, so that only changed files are added to the backup. No hard links are created and you can't look at the backup like different full backups. Rather you look at only one full backup in which files gets added, changed or removed. Both solutions are able to revert back in time between different backup versions or stages.

Duplicity

A great advantage of duplicity is that besides from archiving the files it also uses the GnuPG tools to encrypt the backup. If you have a GnuPG key you can use that with the --encrypt-key option.

Another great advantage is that Duplicity makes it possible to pass on options to GnuPG.

Creating a backup using a password (no GnuPG key) without specifying the encryption algorithm and having foo.bar as my backup server can be done like this:

$ cd /home/user_name
$ duplicity important_data_dir/ scp://foo.bar//home/user_name/duplicity-backup --ssh-askpass

Password for 'foo.bar':
GnuPG passphrase:
Retype to confirm:
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
StartTime 1225355412.61 (Thu Oct 30 09:30:12 2008)
EndTime 1225355412.92 (Thu Oct 30 09:30:12 2008)
ElapsedTime 0.31 (0.31 seconds)
SourceFiles 4
SourceFileSize 41325 (40.4 KB)
NewFiles 4
NewFileSize 41325 (40.4 KB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 4
RawDeltaSize 37229 (36.4 KB)
TotalDestinationSizeChange 36086 (35.2 KB)
Errors 0
-------------------------------------------------

If the above command is run repeatedly, the first session will be a full backup, and subsequent ones will be incremental. The --full option can be used to force a full backup.

If you want to specify the encryption algorithm you can do it like this:

$ duplicity important_data_dir/ scp://foo.bar//home/user_name/duplicity-backup --ssh-askpass --gpg-options "--cipher-algo=AES256"

If you are using another port for SSH you can specify the port number by passing options to sftp like this:

$ duplicity important_data_dir/ scp://foo.bar//home/user_name/duplicity-backup --ssh-askpass  --ssh-options "-oPort=8000" --gpg-options "--cipher-algo=AES256"

After SSH'ing to the backup computer, I can see that duplicity has created the directory duplicity-backup and looking inside it, looks like this:

totalt 44K
duplicity-full.2008-10-30T09:29:58+01:00.manifest.gpg
duplicity-full.2008-10-30T09:29:58+01:00.vol1.difftar.gpg
duplicity-full-signatures.2008-10-30T09:29:58+01:00.sigtar.gpg

After adding a new file and changing one of the original files I use duplicity again:

$ duplicity important_data_dir/ scp://foo.bar//home/user_name/duplicity-backup --ssh-askpass
Password for 'foo.bar':
GnuPG passphrase:
Retype to confirm:
--------------[ Backup Statistics ]--------------
StartTime 1225356052.77 (Thu Oct 30 09:40:52 2008)
EndTime 1225356052.82 (Thu Oct 30 09:40:52 2008)
ElapsedTime 0.05 (0.05 seconds)
SourceFiles 5
SourceFileSize 42153 (41.2 KB)
NewFiles 2
NewFileSize 4120 (4.02 KB)
DeletedFiles 0
ChangedFiles 1
ChangedFileSize 5521 (5.39 KB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 3
RawDeltaSize 5045 (4.93 KB)
TotalDestinationSizeChange 5424 (5.30 KB)
Errors 0
-------------------------------------------------

It has recorded one file as changed and completed with 2 new files.

Taking a new look at the target it now looks like this:

totalt 60K
duplicity-full.2008-10-30T09:29:58+01:00.manifest.gpg
duplicity-full.2008-10-30T09:29:58+01:00.vol1.difftar.gpg
duplicity-full-signatures.2008-10-30T09:29:58+01:00.sigtar.gpg
duplicity-inc.2008-10-30T09:29:58+01:00.to.2008-10-30T09:40:41+01:00.manifest.gpg
duplicity-inc.2008-10-30T09:29:58+01:00.to.2008-10-30T09:40:41+01:00.vol1.difftar.gpg
duplicity-new-signatures.2008-10-30T09:29:58+01:00.to.2008-10-30T09:40:41+01:00.sigtar.gpg

To list the files in the backup you can do the following:

$ duplicity list-current-files scp://foo.bar//home/user_name/duplicity-backup --ssh-askpass
Password for 'foo.bar':
GnuPG passphrase:
Retype to confirm:
Thu Oct 30 09:40:04 2008 .
Thu Oct 30 09:40:04 2008 text_file.txt
Thu Oct 30 09:39:29 2008 the_first.gif
Wed Oct 29 15:41:55 2008 the_second.jpg
Wed Oct 29 15:42:23 2008 the_third.png
$ duplicity collection-status scp://foo.bar//home/user_name/duplicity-backup --ssh-askpass
Password for 'foo.bar':
Connecting with backend: sshBackend
Archive dir: None
Found 0 backup chains without signatures.
Found a complete backup chain with matching signature chain:
-------------------------
Chain start time: Thu Oct 30 09:29:58 2008
Chain end time: Thu Oct 30 09:40:41 2008
Number of contained backup sets: 2
Total number of contained volumes: 2
Type of backup set:     Time:                       Num volumes:
Full                    Thu Oct 30 09:29:58 2008    1
Incremental             Thu Oct 30 09:40:41 2008    1
-------------------------
No orphaned or incomplete backup sets found.

Rather than just being able to restore the backup, you can decide which backup you would like to restore using the --restore-time option. Like with a revision control system you can go back in time and decide which backup you want to restore.

In the example above I have first made a full backup, next I added a new file and changed one of the other files. I now delete the original directory on the source computer and restore the first backup I made rather than the latest backup.

From the above I can see the following:

Full                    Thu Oct 30 09:29:58 2008    1
Incremental             Thu Oct 30 09:40:41 2008    1

If I just want to restore my files from the latest backup I can do that like this:

$ duplicity scp://foo.bar//home/user_name/duplicity-backup --ssh-askpass restored-backup

Where restored-backup is going to contain my restored backup files.

If you want to restore the files in another directory or in the original directory you can just use another directory name in the above command. If the directory doesn't exist duplicity will create it for you:

$ duplicity scp://foo.bar//home/user_name/duplicity-backup --ssh-askpass some-other-dir

Lets restore the first backup (oldest) rather than the latest:

$ duplicity --restore-time 2008-10-30T09:29:58 scp://foo.bar//home/user_name/duplicity-backup --ssh-askpass restored-backup

By using the option --restore-time you can revert to any point in time in which you created a backup.

Duplicity use time strings in two places. Firstly, many of the files duplicity creates will have the time in their filenames in the w3 datetime format as described in a w3 note at http://www.w3.org/TR/NOTE-datetime.

Basically they look like this: "2001-07-15T04:09:38-07:00". The "-07:00" section means the time zone is 7 hours behind UTC.

Secondly, the -t, and --restore-time options take a time string, which can be given in any of several formats:

From the Duplicity manual:

The string "now" (refers to the current time).

A sequences of digits, like "123456890" (indicating the time in seconds after the epoch).

A string like "2002-01-25T07:00:00+02:00" in datetime format.

An interval, which is a number followed by one of the characters s, m, h, D, W, M, or Y (indicating seconds, minutes, hours, days, weeks, months, or years respectively), or a series of such pairs. In this case the string refers to the time that preceded the current time by the length of the interval. For instance, "1h78m" indicates the time that was one hour and 78 minutes ago. The calendar here is unsophisticated: a month is always 30 days, a year is always 365 days, and a day is always 86400 seconds.

A date format of the form YYYY/MM/DD, YYYY-MM-DD, MM/DD/YYYY, or MM-DD-YYYY, which indicates midnight on the day in question, relative to the current timezone settings. For instance, "2002/3/5", "03-05-2002", and "2002-3-05" all mean March 5th, 2002.

Conclusion

I like the fact that Duplicity encrypts the backup and this is the main reason why I choose Duplicity over Rsnapshot.

Also Rsnapshot is developed towards providing an automated backup solution, which is great, but it is just as easy to make Duplicity or any other solution do that using simple cronjobs (sometimes coupled with a bit of scripting). I mostly do my backup by hand, and even though it is convenient with automated backups (I use that as well), I prefer that the solution isn't an automated solution.

I have done extended testing on all the above mentioned solutions, and Duplicity is my favorite.