:: Re: [DNG] defective RAID
Top Page
Delete this message
Reply to this message
Author: Renaud OLGIATI
Date:  
To: dng
Subject: Re: [DNG] defective RAID
On Sat, 25 Mar 2017 15:17:11 -0400
Hendrik Boom <hendrik@???> wrote:

> If I were to manage to reconnect the absent drive, how would the
> boot-time RAID assembly work? (/boot is on the RAID). Would it be
> able to figure out which of the two drives is up-to-date, and
> therefore which one to consider defective and not use?
>
> Do I need to wipe the missing drive completely before I connect it?
> (I have another machine to do this on, using a USB-to-SATA interface).


Picked up from somewhere, and used a couple times already. Long !

Cheers,

Ron.
-- 
           One of the sanest, surest and most generous joys of life
           comes from being happy over the good fortune of others.
                                              -- Robert A. Heinlein


                   -- http://www.olgiati-in-paraguay.org --




How To Replace The HDD in a RAID 1 Array

1 Preliminary Note

In this example I have two hard drives, /dev/sda and /dev/sdb,
with the partitions /dev/sda1 and /dev/sda2 as well as /dev/sdb1 and /dev/sdb2.

/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0.
/dev/sda2 and /dev/sdb2 make up the RAID1 array /dev/md1.
/dev/sda1 + /dev/sdb1 = /dev/md0
/dev/sda2 + /dev/sdb2 = /dev/md1

/dev/sdb has failed, and we want to replace it.

2 How Do I Tell If A Hard Disk Has Failed?

If a disk has failed, you will probably find a lot of error messages in the log files, e.g. /var/log/messages or /var/log/syslog.

You can also run
    cat /proc/mdstat
and instead of the string [UU] you will see [U_] if you have a degraded RAID1 array.



3 Removing The Failed Disk

To remove /dev/sdb, we will mark /dev/sdb1 and /dev/sdb2 as failed and remove them from their respective RAID arrays (/dev/md0 and /dev/md1).

First we mark /dev/sdb1 as failed:
    mdadm --manage /dev/md0 --fail /dev/sdb1


The output of
    cat /proc/mdstat
should look like this:


server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[2](F)
      24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/2] [UU]
unused devices: <none>


Then we remove /dev/sdb1 from /dev/md0:
    mdadm --manage /dev/md0 --remove /dev/sdb1
The output should be like this:


server1:~# mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1

And
    cat /proc/mdstat
should show this:


server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
      24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/2] [UU]
unused devices: <none>


Now we do the same steps again for /dev/sdb2 (which is part of /dev/md1):

Then power down the system:
    shutdown -h now
and replace the old /dev/sdb hard drive with a new one (it must have at least the same size as the old one 
if it's only a few MB smaller than the old one then rebuilding the arrays will fail).




4 Adding The New Hard Disk

After you have changed the hard disk /dev/sdb, boot the system.
The first thing we must do now is to create the exact same partitioning as on /dev/sda. 
We can do this with the command sgdisk from the gdisk package. 
If you havent installed gdisk yet, run this command to install it on Debian and Ubuntu:
    apt-get install gdisk


The next step is optional but recomended. To ensure that you have a backup of the partition scheme, 
you can use sgdisk to write the partition schemes of both disks into a file. I will store the backup in the /root folder.
    sgdisk --backup=/root/sda.partitiontable /dev/sda
    sgdisk --backup=/root/sdb.partitiontable /dev/sdb


In case of a failure you can restore the partition tables with the --load-backup option of the sgdisk command.

Now copy the partition scheme from /dev/sda to /dev/sdb run:
    sgdisk -R /dev/sdb /dev/sda


afterwards, you have to randomize the GUID on the new hard disk to ensure that they are unique
    sgdisk -G /dev/sdb


You can run
    sgdisk -p /dev/sda
    sgdisk -p /dev/sdb
to check if both hard drives have the same partitioning now.


Next we add /dev/sdb1 to /dev/md0 and /dev/sdb2 to /dev/md1:
    mdadm --manage /dev/md0 --add /dev/sdb1
server1:~# mdadm --manage /dev/md0 --add /dev/sdb1
mdadm: re-added /dev/sdb1
    mdadm --manage /dev/md1 --add /dev/sdb2
server1:~# mdadm --manage /dev/md1 --add /dev/sdb2
mdadm: re-added /dev/sdb2


Now both arays (/dev/md0 and /dev/md1) will be synchronized. Run
    cat /proc/mdstat
to see when it's finished.


During the synchronization the output will look like this:

    cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
      24418688 blocks [2/1] [U_]
      [=>...................]  recovery =  9.9% (2423168/24418688) finish=2.8min speed=127535K/sec
md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/1] [U_]
      [=>...................]  recovery =  6.4% (1572096/24418688) finish=1.9min speed=196512K/sec
unused devices: <none>


When the synchronization is finished, the output will look like this:

    cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
      24418688 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/2] [UU]
unused devices: <none>


That's it, you have successfully replaced /dev/sdb!