:: Re: [DNG] defective RAID
Inizio della pagina
Delete this message
Reply to this message
Autore: Hendrik Boom
Data:  
To: dng
Oggetto: Re: [DNG] defective RAID
On Sat, Mar 25, 2017 at 09:00:23PM +0000, Simon Hobson wrote:
> Hendrik Boom <hendrik@???> wrote:
>
> > I have two twinned RAIDs which are working just fine although the
> > second drive for both RAIDs is missing. After all, that's what it is
> > supposed to do -- work when things are broken..
> >
> > The RAIDs are mdadm-style Linux software RAIDs. One contains a /boot
> > partition; the other an LVM partition that contains all the other
> > partitions in the system, including the root partition, /usr, /home,
> > and the like.
> >
> > Both drives should contain the mdadm signature information, and the
> > same consistent file systems.
> >
> > Each RAID is spread, in duplicate, across the same two hard drives.
> >
> >
> > EXCEPT, of course, that one of the drives is now missing. It was
> > physically disconnected by accident while the machine was off, and
> > owing to circumstances, has remained disconnected for a significant
> > amunt of time.
> >
> > This means that the missing drive has everything needed to boot the
> > system, with valid mdadm signatures, and valid file systems, except,
> > of course, that its file system is obsolete.
> >
> > If I were to manage to reconnect the absent drive, how would the
> > boot-time RAID assembly work? (/boot is on the RAID). Would it be
> > able to figure out which of the two drives is up-to-date, and
> > therefore which one to consider defective and not use?
>
> OK, been there, got the tee shirt :-)
>
> If you boot the system with the second drive connected, (I think)
> you'll find yourself with two sets of raid volumes. The risk is
> that, depending on how the system is setup, it's "a bit arbitrary"
> which one gets mounted. Ideally you want to boot the system and then
> connect the second drive.


They are SATA drives, so in theory they can be hot-plugged -- provided
no one cheated on the connectors. They did once make cheap SATA
connectors that didn't have the right varies contact lengths for
hotplugging. I'd have to hope that mine were not made defective.

> At this point, my memory gets a bit vague - lots of googling while
> "slightly stressed" (production system down).
>
> IIRC you can't just add the partitions back into the arrays - it'll
> complain that the update counters are different. There's a counter
> which gets updated when the array is written to, and so when an
> array member is absent - the counters get out of sync and this can
> be used to detect the issue and not assemble an array from
> inconsistent members.


IIRC you can under normal circumstances -- when the RAID is properly
perating with two drives, and thend a drive fails while in operation.
Then it is marked defective so that is won't be used next time. This
has happened in the past, and when adding the partition bac to the
RAID is syncsup properly by copying everything for the good drive.

But my failed drive got disconnected when the machine was off, so it
didn't get marked, so your concerns are real here.

> So I think you need to use the "delete metadata" option to mdadm to "clear" the partitions. Then you add it in, and it'll be rebuilt.
> You may have to explicitly remove a device before you can re-add it.


I'd have to delete the metadata before I boot with both drives, then.
>
> Your /boot may be OK. It's typically not written to so it can just be assembled - the others will need to be rebuilt.


Typically, no. But my system has received security upgrades, so the
/boot I'm using is different from the /boot on the other drive. /boot
was also in RAID. Yes, using the older form of metadata at the end of
the partition so that the usual bootloaders don't get confused.

>
> Just checking man mdadm, and adding a bit of vague memory recall ...
> mdadm --detail /dev/sdxn will tell you ... well details ... about an array, specifically what devices it has and hasn't
> mdadm --examine /dev/sdxn will tell you details, including this update counter, it's labelled "Events"
> mdadm /dev/mdnnn --add /dev/sdxn will add a drive. It will automatically go into rebuild mode which will be shown in /proc/mdstat


I suppose I'll have to do this on another machine. Unless it cn
autoatically guess where to boot from and which parts of the RAID are
defective, I can't boot with both drives.

A nuisance, but my other machine has no RAID partitions, so it won't
run the risk of confusion. And I can add it in live because I'll use
an SATA/USB interface to mount it.

Still, I'd like to be able to trust the automation to get it all
right.

-- hendrik

>
>
> _______________________________________________
> Dng mailing list
> Dng@???
> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng