:: Re: [DNG] Cannot boot my server.
Página superior
Eliminar este mensaje
Responder a este mensaje
Autor: Hendrik Boom
Fecha:  
A: tito
Cc: dng
Asunto: Re: [DNG] Cannot boot my server.
Things just got worse. I got refracta set up, but now the box refuses to
do a power-on self-test. With or without refracta. I'm starting to
suspect hardware problems. It's likely time to replace the box and
transfer the data and configuration, it necessary, from backup.
I wonder what to get short-term (I'd like my regular email to start up
again) and/or long-term. Maybe something completely different.

-- hendrik


On Sun, Dec 17, 2023 at 3:58 AM tito via Dng <dng@???> wrote:

> On Sat, 16 Dec 2023 16:38:41 -0500
> Hendrik Boom via Dng <dng@???> wrote:
>
> > I currently have no email access through my usual email address; thus I
> am
> > resorting to gmail.
> > The server was working fine last night, until:
> >
> > Last night my server became completely nonresponsive. It was
> inaccessible
> > through wifi, and it wouldn't respond to keyboard input. Its screen
> > remained black.
> > But it seemed to be busy, judging from its blinking hard drive light.
>
> > I rebooted it by the power button. This was a hard reset.
> >
> > Subsequently it refused to boot. It stalled in the initrd, claimint
> that I
> > eeded to manually fsck the root partition.
>
> What distro? Devuan I suppose. Where you bitten by the kernel ext4
> corruption?
> Which kernel version pre 6.1.67 or post?
>
> > So I entered the appropriate fsck command from the keyboard, checking it
> > twice, and leaving out the -y so that in case I made a mistake it
> shouldn't
> > screw anything worse than it was already. (If successful I planned to
> redo
> > it with the -y)
> >
> > But fsck reported it could not find the partitoin it was to check.
> >
> > It was an lvm partion on a RAID. (fsck can handle that, right? Or did I
>
> This adds one more layer of complexity...
>
> > do it wrong?  Something like
> >     fsck /def/dm-1/VG1-long-name
> > into the (initramfs) prompt.
> > )

>
> What RAID 1,5,6,10?
>
> > I also did an ls on that partition. That worked, except the top-level
> > directory listing was gibberish.
> >
> > At this point I figured the root partition wa throroughly borked. that
> was
> > probably what the server was busy with when it went unresponsive --
> borking
> > the root partition.
> >
> > Any advice at this point?
>
> Do a SMART self test on the involved drives to see if there is any
> hardware problem first and if you can access the log look
> also there for anomalies. Check the cables, reseat them,
> and check the PSU.
> >
> > Next I figure it's time to wipe that partition, create a new one in that
> > place or elsewhere, and and restore it from backup. Yes, I have a recent
> > backup! (unusual for this kind of question)
> >
> > But without booting, that won't work,
> >
> > * Approach 1: Get a copy of refracta, and boot from that.
>
> Approach 1 bis:
> Take a usb drive and install the very same version of distro you are using
> on the server
> to it and install grub to the usb drive, boot from there.
> This saves you from version mismatches and allows you to copy over
> to the system any file that is corrupted.
>
> > So I downloaded
> >
> https://get.refracta.org/files/beowulf/refracta10.6_xfce_amd64-20211226_1733.iso
> > and dd'd it to /dev/sdc1 . (Is that the way to do it?)
> > before using it on the server, I decide to try booting from it on my
> > laptop, just to rule out one thing that went wrong.
> >
> > (yes, I used the beowulf version to reduce any incompatibilities that
> might
> > arise between different releases of Devuan, not that I expected any)
> >
> > I told the laptop to boot from USB, booted, and it complained it could
> not
> > find any boot medium. Evidently not copied correctly, or wrong kind of
> > boot record.
> >
> > (my laptop and server are both, as far as I've ever known, BIOS
> machines).
> >
> > Any further advice?
> >
> > I have no website, and no access to my email system until this is
> resolved.
> >
> > * Approach 2:
> >
> > (not tried yet)
>
> This is the last thing I would do, USB to SATA adapters have their own
> quirks and in my experience mostly add problems rather than solve them.
>
> > Remove the relevant hard drive from the server and connect it to my
> laptop
> > with as USB/SATA adapter and mess with it that way.
> > I'll have to activate the RAID on my laptop (how?) to process that drive
> > properly. Should be OK, since the RAID is currently defective, and
> there's
> > another drive on my table that hasn't been activated yet, and it's a
> > defective RAID with only one drive.
> >
> > Any other ideas?
>
> Not yet.
>
> Ciao,
> Tito
> > -- hendrik
>
> _______________________________________________
> Dng mailing list
> Dng@???
> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
>