Author: Hendrik Boom Date: To: dng Subject: [DNG] Long-term archiving versus medium fallibility
These days an increasing amount of my personal information, bookd,
mementos, family photos, and work data are being kept on digital media.
And those are vulnerable.
It's well-known that to archive files long-term (say, ten years or more)
it is necessary to keep multiple copies, preferably on different media.
So this is what I'd like to do with my critical files.
Yet these files are also working files, are kept online, and
legitinately need to be modified from time to time.
So I keep backups. Currently I use rdiff-backup, which does have the
ablity to keep older as well as newer versions of files on the same
backup drive. And I keep multple backups.
(This might even help somewhat against ransomware attacks)
--
Now storage media deteriorate over time.
It is necessary to read and transfer data from old media to new from
time to time. Yes, I know that. My present method is to keep
everything on my server, and make regular backups.
(OK, my backuos aren't all *that* regular, but I try)
Now the master copy is the working file system of my server.
And even in the absence of ransomware, there are occasional disk
failures.
Yes, I use a RAID so any detected disk failures don't cause immediate
data loss. (it also has the side effect of letting me continue running
apparently unaffected from the time a disk has failed until I manage to
replace it.
And I also use the ext4 file system against unexpected shutdowns. Yes,
I journal everything, not just metadata. So after a crash, or
unexpected power outage, the file system is easy to restore to a
consistent state.
Now for further protection against data failure, I'd like to introduce
checksumming. This is available with btrfs and zfs (or is it xfs? I
forget which is which).
But ... all of this relies on valid RAM.
Copying files to or from backup, updating files, all of it is done by
copying into RAM and then copying it from RAM. In the presence of
faulty RAM, even a backup copy could be seriously damaged.
And this is worse with the newer b-tree file systems, which are
constantly copying data. - even data which hasn't changed. A single
update will read a large block of data from disk, make the changes, and
write it back. The entire block is this written back, complete with
changes, bit-failures from RAM problems, and a new check-sum to validate
the bad bits.
I'm told the maintainers of thse file-systems laugh at you if you're not
using ECC memory.
---
Now I'm wondering how to introduce chack-summing to protect against this
kind of data loss despite occasionally (but rarely) filing memory.
* I could run memory checks frequently to catch failing memory. But the
circumstances in ordinary operation differ from the circumstances of the
memory check program, and faulty memory might fail to be detected.
* I could install ECC memory. But that has become difficlt to get, and
some processors on the mass market won't even handle it properly.
* I could hope the ext4 developers will add checksums to the ext4 file
system, possibly renaming it to ext5.
* Or I could try doing my own checksuming. Perhaps checksumming
everythin in the file system and catching files whose checksums have
changed without a new modification date. This could be done at backup
time, flaggin such discrepancies for manual attention. (note: need to
check the checksum on the backup, too).