Re: [DNG] Some RAID1's inaccessible after upgrade to beowulf from ascii

Author: tito
Date:
To: dng
Subject: Re: [DNG] Some RAID1's inaccessible after upgrade to beowulf from ascii

On Wed, 10 Nov 2021 18:37:24 -0500
Hendrik Boom <hendrik@???> wrote:

> On Tue, Nov 09, 2021 at 02:56:59PM -0500, Hendrik Boom via Dng wrote:
> > I upgraded my server to beowulf.
> >
> > After rebooting, all home directories except root's are no longer
> > accessible.
> >
> > They are all on an LVM on software RAID.
> >
> > The problem seems to be that two of my three RAID1 systems are not
> > starting up properly. What can I do about it?
> >
>
> After following suggestions from the replies I got here, I determined
> that one of my three physical disk drives was not being reognised by the
> operating system.
>
> Took the cover off the machine.
>
> Looked in with a penlight and tured the machine off.
>
> Wiggled and pushed on some SATA cables and power-supply cables.
>
> Rebooted.
>
> All three drives came up properly. The RAIDs assembled properly.
>
> All except /dev/md0. It's a defective RAID necause the disk drive
> containing its second copy died a long time ago. I should move its data
> off it onto one of the other RAIDs sometime.
>
> Now there remains the question:
>
> Why didn't /dev/md1 and /dev/md2 assemble properly as defective
> RAID1s when their second copies were gone? Isn't that the whole point
> of a RAID1?
>
> /dev/md0 had no such problem.
>
> -- hendrik

Hi,
I once had similar problems with disk randomly vanishing at reboot
and arrays not coming up correctly. I tried all possible solutions:
swapping disks, cables, controllers, motherboards, hd cages with
no result. In the end I discovered that the old 500w psu was not able
to power all 10 hd drives when they were spinning up all at the same time
at boot or when one array was rebuilding and I did heavy io on another
array. Changing the psu to a newer 1000w one solved the issue magically.

Hope this helps.

Ciao,
Tito

> >
> >
> > hendrik@april:/$ cat /proc/mdstat > > Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] > > [raid4] [raid10] > > md1 : inactive sda2[3](S) > > 2391296000 blocks super 1.2

> >
> > md2 : inactive sda3[0](S) > > 1048512 blocks

> >
> > md0 : active raid1 sdf4[1] > > 706337792 blocks [2/1] [_U]

> >
> > unused devices: <none>
> > hendrik@april:/$
> >
> >
> >
> > hendrik@april:/$ cat /etc/mdadm/mdadm.conf
> > DEVICE partitions
> > ARRAY /dev/md0 level=raid1 num-devices=2
> > UUID=4dc189ba:e7a12d38:e6262cdf:db1beda2
> > ARRAY /dev/md1 metadata=1.2 name=april:1
> > UUID=c328565c:16dce536:f16da6e2:db603645
> > ARRAY /dev/md2 UUID=5d63f486:183fd2ea:c2a3a88f:cb2b61de
> > MAILADDR root
> > hendrik@april:/$
> >
> >
> >
> > The standard recommendation seems to be to replace lines
> > in /etc/mdadm/mdadm.conf by lines prouced by mdadm --examine --scan:
> >
> >
> >
> > april:~# mdadm --examine --scan
> > ARRAY /dev/md/1 metadata=1.2 UUID=c328565c:16dce536:f16da6e2:db603645
> > name=april:1
> > ARRAY /dev/md2 UUID=5d63f486:183fd2ea:c2a3a88f:cb2b61de
> > ARRAY /dev/md0 UUID=4dc189ba:e7a12d38:e6262cdf:db1beda2
> > april:~#
> >
> >
> >
> > But this replacement involves changing a line that dies work (md0),
> > not changing one that did not (md2),
> > and changing another one that did not work (md1).
> >
> > Since --examine's suggested changes seem uncorrelated
> > with the active/inactive record, I have little faith in
> > this alleged fix without first gaining more understanding.
> >
> > -- hendrik
>

This message is part of the following thread:
	the complete thread tree sorted by date
	Hendrik Boom at
	Hendrik Boom at

Donate to Dyne.org