Re: [Dng] btrfs repair works fine, Lennart has no idea what he is talking about - was OT - It may be only one file, but it does point to the bigger problem!

Autor: Peter Maloney
Fecha:
A: dng
Temas antiguos: [Dng] OT - It may be only one file, but it does point to the bigger problem!
Asunto: Re: [Dng] btrfs repair works fine, Lennart has no idea what he is talking about - was OT - It may be only one file, but it does point to the bigger problem!

On 02/22/2015 07:28 PM, Jim Murphy wrote:
> [...]
> Part of the discussion:
>
>>> btrfs checksumming theoretically allows you to transparently recover
>>> after media corruption if filesystem has redundancy (more than one
>>> copy of data). Journald checksum will probably detect corruption, but
>>> can it repair it?
>>>
>> No it cannot.
>> But btrfs checksumming cannot fix things for you either if you lose
>> non-trivial amounts of data. It might be able to fix a few bits of
>> errors, but not non-trivial amounts. I mean, that's a simple property
>> of error correction codes: the more you want to be able to correct the
>> longer must your checksum be. Neither btrfs' nor journald's are
>> substantial enough to correct even a sector...
>>
>> Lennart
>

This is pure ignorance. It does not require the redundancy provided by
the CRC algorithm to recover the data; it uses the checksum just to find
out if the copy is good, and uses redundancy provided by raid to repair
it. (which is simply what Lennart's victim already said by adding
context with "if filesystem has redundancy" and "more than one copy of
data", which is not the CRC). The checksum doesn't need to be longer to
repair it, only to prevent collision. The chance of a collision is
something like one in 2^32 = 4 billion. (< 1 in 512 :P)

Test this out simply by making a raid1, filling it with data, then run 2
things in infinite loops. One to repeat scrubs, and one to write random
data to the disks, not just a few bits.

Here's 30 minutes of the test script (kernel 3.18.x, btrfs tools version
3.18.2):

Konsole output Konsole output
WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
       scrub started at Fri Feb 27 15:07:34 2015 and finished after 159
seconds
       total bytes scrubbed: 13.20GiB with 120 errors
       error details: csum=120
       corrected errors: 120, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14152)

WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
       scrub started at Fri Feb 27 15:10:14 2015 and finished after 144
seconds
       total bytes scrubbed: 13.20GiB with 14 errors
       error details: csum=14
       corrected errors: 14, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14275)

WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
       scrub started at Fri Feb 27 15:12:44 2015 and finished after 139
seconds
       total bytes scrubbed: 13.20GiB with 80 errors
       error details: csum=80
       corrected errors: 80, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14377)

WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
       scrub started at Fri Feb 27 15:15:04 2015 and finished after 168
seconds
       total bytes scrubbed: 13.20GiB with 14 errors
       error details: csum=14
       corrected errors: 14, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14505)

WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
       scrub started at Fri Feb 27 15:17:54 2015 and finished after 163
seconds
       total bytes scrubbed: 13.20GiB with 110 errors
       error details: csum=110
       corrected errors: 110, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14595)

WARNING: errors detected during scrubbing, corrected.
scrub status for af936534-6c3f-4136-809a-740a32a65591
       scrub started at Fri Feb 27 15:20:44 2015 and finished after 173
seconds
       total bytes scrubbed: 13.20GiB with 53 errors
       error details: csum=53
       corrected errors: 53, uncorrectable errors: 0, unverified errors: 0
scrub started on /mnt/test, fsid af936534-6c3f-4136-809a-740a32a65591
(pid=14737)

Obviously there is a chance for both copies to be destroyed at the same
time... but it isn't all that likely in 20 minutes, even with such high
destruction rate. But clearly this disproves Lennart's unfounded
statement, saying a single sector cannot be repaired. Here's 391 blocks
so far, which I assume is more than 391 sectors. Clearing cache and then
doing a diff on the test files compared to the original copy shows that
they are undamaged. (this means you can cp the files away without any
loss, but maybe there are bugs that will make btrfs die later :P it's
not exactly fully production ready)

So change "theoretically" in the above email to "in practice".

And the test script:

################
# variables used in many parts of the script
################
disk1=/dev/data/btrfs1
disk2=/dev/data/btrfs2
testuser=peter

################
# Set up some disks
################
lvcreate -n btrfs1 -L 10g data
lvcreate -n btrfs2 -L 10g data

chown "$testuser" "$disk1" "$disk2"

mkfs.btrfs -d raid1 -m raid1 /dev/data/btrfs{1,2}

mount /dev/data/btrfs1 /mnt/test

cp -a ~peter/archive/software/manjaro/ /mnt/test

# make sure there is enough data to test
# # df -h /mnt/test
# Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/data-btrfs1     10G  5.7G  3.3G  64% /mnt/test

# make sure the files match so we can compare properly later
# diff -qr manjaro/ ~peter/archive/software/manjaro/

################
# The scrub script
################

while true; do
    if ! btrfs scrub status /mnt/test | grep "running for" >/dev/null
2>&1; then
        btrfs scrub status /mnt/test
        btrfs scrub start /mnt/test
        echo
    fi
    sleep 10
done

################
# The disk mutilation script
################

# run as a non-root user
mutilate() {
    # Pick a disk
    if [ $(($RANDOM % 2 )) == 1 ]; then
        target=${disk1}
    else
        target=${disk2}
    fi
    echo "Disk $target selected"

    # Pick a sector
    sz=$(blockdev --getsz "${target}")
    sector=$(($RANDOM$RANDOM % $sz))
    echo "sector $sector selected"

    # just a paranoid safety check
    if [ -z "$disk1" -o -z "$disk2" -o "$target" != "$disk1" -a
"$target" != "$disk2" -o "${target:0:6}" = "/dev/s" ]; then
        echo "ERROR: safety check failed..."
        return 1
    fi
    if [ "$(id -u)" = "0" ]; then
        echo "ERROR: don't run as root..."
        return 1
    fi

    # damage the disk
    dd if=/dev/urandom of=${target} bs=512 count=100 seek=$sector
}

while true; do
    # destroy 10 random places x 100 blocks x 512 bytes per block (510 kB)
    for n in {1..10}; do
        mutilate
    done
    sleep 300 # scrub takes about 5min
done

Donate to Dyne.org