Author: Simon Hobson Date: To: Devuan ML Subject: Re: [DNG] Backup methods for Devuan
Steve Litt <slitt@???> wrote:
> It sounded good until the deduplication and compression. I view
> deduplication and compression as negative parity bits, such that even
> in a text file, one flipped bit messes up everything.
(Mostly) agreed.
Both have their place, and that’s a decision for the individual to make based on their own priorities and limitations. On the latter, I have not so fond memories (in a smallish business) of chasing tape capacity developments with “limited budget” - and without compression I’d have not been able to do some of the backups we did, or would have needed more tapes (which is a problem if you don’t have a trained monkey you can leave overnight to swap tapes).
At another place I worked, I had even less budget and worked with hand me down stuff the Windows server guys were taking out. Anyway, I did backups in a 2 step process (sorry, a bit vague as it’s well over a decade since I set it up) :
First, I had each of my machines use rsync to keep up to date a single copy of the entire system. So I had a backup storage server that had a directory for each of my machines, and this would be up to date as of the previous night.
Then I used a piece of software who’s name I can’t recall, which kept multiple copies in a tree using hard-linking for file de-dup - like rsync can do with the link option. Each backup in the tree was complete in it’s own right, but only used disk space for files that weren’t the same as in the previous backup. It had options for compression, but I didn’t use these.
The neat thing was that it allowed you to specify a retention schedule and it would automatically keep enough backups to meet that. So for example :
13d22h 15d2h would tell it to keep a copy that’s 14 days old, so each of the daily backups would be kept up to then as it worked out that they’d be needed (using 13d22h 15d2h instead of 14d 15d just gets round the variable timing that might mean two backups were (say) 24 1/2 hours apart).
Extending this to 13d22h 15d2h 42d 49d would tell it to keep one that’s between 15 and 42 days old (in this case, a null operator), and one between 42 and 49 days old. For the latter, it would keep every 7th daily as again, it works out what it will need in the future to satisfy the rule. Thus now it’s daily up to 2 weeks, weekly up to 6 weeks.
Then 13d22h 15d2h 42d 49d 365d 395d would extend that again, and it would keep backups needed to satisfy the 30 day interval, so monthly (well 30 days) up to a year.
There was also an option to duplicate the tree to another host, e.g. to an offsite copy, but I never got around to having the resources for that.
There is a quirk in the above I’ll come to …
This worked great. And because I didn’t use compression, it was easy to look into any of the backup trees for files. Plus you could do things like “ls -l <blah>/*/etc/<some file>” and see when a file got changed.
Now, that quirk.
The backup would run each day, but of course had no knowledge whether any individual machine had failed to do it’s rsync. As a result, you could look and see a full backup tree - even if nothing had changed.
So on each machine, I had my backup script as something along the lines of :
rsync / servername@backupserver::/servername &&
date > /etc/backup_time &&
rsync …
Thus /etc/backup_time in the backups was an indication of when the actual backup happened.
And on the backup server, the script was along these lines :
for server in <list>
if -nt store/etc/backup_time server backups/server/now/etc/backup_time then do_backup
Thus if the rsync backup failed, or the server was down, or ..., then there would be gaps in the backup tree to match and you wouldn’t get an untrue idea of what had been backed up and when.
I found I had to actually modify the contents of the file - just touching an empty file would affect the backups as they got de-duped by hard linking (IIRC it kept a metadata file which it used to restore correct modification times when using the restore function). Modifying the contents meant that the file got copied each time.