On 02/22/2015 07:28 PM, Jim Murphy wrote:
> [...]
> Part of the discussion:
>
>>> btrfs checksumming theoretically allows you to transparently recover
>>> after media corruption if filesystem has redundancy (more than one
>>> copy of data). Journald checksum will probably detect corruption, but
>>> can it repair it?
>>>
>> No it cannot.
>> But btrfs checksumming cannot fix things for you either if you lose
>> non-trivial amounts of data. It might be able to fix a few bits of
>> errors, but not non-trivial amounts. I mean, that's a simple property
>> of error correction codes: the more you want to be able to correct the
>> longer must your checksum be. Neither btrfs' nor journald's are
>> substantial enough to correct even a sector...
>>
>> Lennart
>
This is pure ignorance. It does not require the redundancy provided by
the CRC algorithm to recover the data; it uses the checksum just to find
out if the copy is good, and uses redundancy provided by raid to repair
it. (which is simply what Lennart's victim already said by adding
context with "if filesystem has redundancy" and "more than one copy of
data", which is not the CRC). The checksum doesn't need to be longer to
repair it, only to prevent collision. The chance of a collision is
something like one in 2^32 = 4 billion. (< 1 in 512 :P)
Test this out simply by making a raid1, filling it with data, then run 2
things in infinite loops. One to repeat scrubs, and one to write random
data to the disks, not just a few bits.
Here's 30 minutes of the test script (kernel 3.18.x, btrfs tools version
3.18.2):
Konsole output Konsole output
WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
scrub started at Fri Feb 27 15:07:34 2015 and finished after 159
seconds
total bytes scrubbed: 13.20GiB with 120 errors
error details: csum=120
corrected errors: 120, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14152)
WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
scrub started at Fri Feb 27 15:10:14 2015 and finished after 144
seconds
total bytes scrubbed: 13.20GiB with 14 errors
error details: csum=14
corrected errors: 14, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14275)
WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
scrub started at Fri Feb 27 15:12:44 2015 and finished after 139
seconds
total bytes scrubbed: 13.20GiB with 80 errors
error details: csum=80
corrected errors: 80, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14377)
WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
scrub started at Fri Feb 27 15:15:04 2015 and finished after 168
seconds
total bytes scrubbed: 13.20GiB with 14 errors
error details: csum=14
corrected errors: 14, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14505)
WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
scrub started at Fri Feb 27 15:17:54 2015 and finished after 163
seconds
total bytes scrubbed: 13.20GiB with 110 errors
error details: csum=110
corrected errors: 110, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14595)
WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
scrub started at Fri Feb 27 15:20:44 2015 and finished after 173
seconds
total bytes scrubbed: 13.20GiB with 53 errors
error details: csum=53
corrected errors: 53, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14737)
Obviously there is a chance for both copies to be destroyed at the same
time... but it isn't all that likely in 20 minutes, even with such high
destruction rate. But clearly this disproves Lennart's unfounded
statement, saying a single sector cannot be repaired. Here's 391 blocks
so far, which I assume is more than 391 sectors. Clearing cache and then
doing a diff on the test files compared to the original copy shows that
they are undamaged. (this means you can cp the files away without any
loss, but maybe there are bugs that will make btrfs die later :P it's
not exactly fully production ready)
So change "theoretically" in the above email to "in practice".
And the test script:
################
# variables used in many parts of the script
################
disk1=/dev/data/btrfs1
disk2=/dev/data/btrfs2
testuser=peter
################
# Set up some disks
################
lvcreate -n btrfs1 -L 10g data
lvcreate -n btrfs2 -L 10g data
chown "$testuser" "$disk1" "$disk2"
mkfs.btrfs -d raid1 -m raid1 /dev/data/btrfs{1,2}
mount /dev/data/btrfs1 /mnt/test
cp -a ~peter/archive/software/manjaro/ /mnt/test
# make sure there is enough data to test
# # df -h /mnt/test
# Filesystem Size Used Avail Use% Mounted on
/dev/mapper/data-btrfs1 10G 5.7G 3.3G 64% /mnt/test
# make sure the files match so we can compare properly later
# diff -qr manjaro/ ~peter/archive/software/manjaro/
################
# The scrub script
################
while true; do
if ! btrfs scrub status /mnt/test | grep "running for" >/dev/null
2>&1; then
btrfs scrub status /mnt/test
btrfs scrub start /mnt/test
echo
fi
sleep 10
done
################
# The disk mutilation script
################
# run as a non-root user
mutilate() {
# Pick a disk
if [ $(($RANDOM % 2 )) == 1 ]; then
target=${disk1}
else
target=${disk2}
fi
echo "Disk $target selected"
# Pick a sector
sz=$(blockdev --getsz "${target}")
sector=$(($RANDOM$RANDOM % $sz))
echo "sector $sector selected"
# just a paranoid safety check
if [ -z "$disk1" -o -z "$disk2" -o "$target" != "$disk1" -a
"$target" != "$disk2" -o "${target:0:6}" = "/dev/s" ]; then
echo "ERROR: safety check failed..."
return 1
fi
if [ "$(id -u)" = "0" ]; then
echo "ERROR: don't run as root..."
return 1
fi
# damage the disk
dd if=/dev/urandom of=${target} bs=512 count=100 seek=$sector
}
while true; do
# destroy 10 random places x 100 blocks x 512 bytes per block (510 kB)
for n in {1..10}; do
mutilate
done
sleep 300 # scrub takes about 5min
done