:: Re: [DNG] Cannot boot my server.
Inizio della pagina
Delete this message
Reply to this message
Autore: tito
Data:  
To: dng
Oggetto: Re: [DNG] Cannot boot my server.
On Sat, 16 Dec 2023 16:38:41 -0500
Hendrik Boom via Dng <dng@???> wrote:

> I currently have no email access through my usual email address; thus I am
> resorting to gmail.
> The server was working fine last night, until:
>
> Last night my server became completely nonresponsive. It was inaccessible
> through wifi, and it wouldn't respond to keyboard input. Its screen
> remained black.
> But it seemed to be busy, judging from its blinking hard drive light.


> I rebooted it by the power button. This was a hard reset.
>
> Subsequently it refused to boot. It stalled in the initrd, claimint that I
> eeded to manually fsck the root partition.


What distro? Devuan I suppose. Where you bitten by the kernel ext4 corruption?
Which kernel version pre 6.1.67 or post?

> So I entered the appropriate fsck command from the keyboard, checking it
> twice, and leaving out the -y so that in case I made a mistake it shouldn't
> screw anything worse than it was already. (If successful I planned to redo
> it with the -y)
>
> But fsck reported it could not find the partitoin it was to check.
>
> It was an lvm partion on a RAID. (fsck can handle that, right? Or did I


This adds one more layer of complexity...

> do it wrong?  Something like
>     fsck /def/dm-1/VG1-long-name
> into the (initramfs) prompt.
> )


What RAID 1,5,6,10?

> I also did an ls on that partition. That worked, except the top-level
> directory listing was gibberish.
>
> At this point I figured the root partition wa throroughly borked. that was
> probably what the server was busy with when it went unresponsive -- borking
> the root partition.
>
> Any advice at this point?


Do a SMART self test on the involved drives to see if there is any
hardware problem first and if you can access the log look
also there for anomalies. Check the cables, reseat them,
and check the PSU.
>
> Next I figure it's time to wipe that partition, create a new one in that
> place or elsewhere, and and restore it from backup. Yes, I have a recent
> backup! (unusual for this kind of question)
>
> But without booting, that won't work,
>
> * Approach 1: Get a copy of refracta, and boot from that.


Approach 1 bis:
Take a usb drive and install the very same version of distro you are using on the server
to it and install grub to the usb drive, boot from there.
This saves you from version mismatches and allows you to copy over
to the system any file that is corrupted.

> So I downloaded
> https://get.refracta.org/files/beowulf/refracta10.6_xfce_amd64-20211226_1733.iso
> and dd'd it to /dev/sdc1 . (Is that the way to do it?)
> before using it on the server, I decide to try booting from it on my
> laptop, just to rule out one thing that went wrong.
>
> (yes, I used the beowulf version to reduce any incompatibilities that might
> arise between different releases of Devuan, not that I expected any)
>
> I told the laptop to boot from USB, booted, and it complained it could not
> find any boot medium. Evidently not copied correctly, or wrong kind of
> boot record.
>
> (my laptop and server are both, as far as I've ever known, BIOS machines).
>
> Any further advice?
>
> I have no website, and no access to my email system until this is resolved.
>
> * Approach 2:
>
> (not tried yet)


This is the last thing I would do, USB to SATA adapters have their own
quirks and in my experience mostly add problems rather than solve them.

> Remove the relevant hard drive from the server and connect it to my laptop
> with as USB/SATA adapter and mess with it that way.
> I'll have to activate the RAID on my laptop (how?) to process that drive
> properly. Should be OK, since the RAID is currently defective, and there's
> another drive on my table that hasn't been activated yet, and it's a
> defective RAID with only one drive.
>
> Any other ideas?


Not yet.

Ciao,
Tito
> -- hendrik