onefang - 08.11.23, 05:43:19 CET: > On 2023-11-07 22:37:47, Martin Steigerwald wrote:
> > Nothing! It just continues to run. If I like broken memory detection
> > and some action on broken memory, totally fine, but then I have that
> > as a separate watchdog kind of service that I install where I need
> > it.
> I have 256 GB of RAM, and 64 cores / 128 threads in this super desktop
> of mine. Every now and then I get a segfault in some random thing. I
> had upgraded to the 6.1 kernel so I can get reports about which core
> just segfaulted, it's random each time. So I suspect its RAM.
>
> Is this a specific broken memory watchdog thing you are talking about?
> If so, what is it? I'd prefer something that can just map out the
> broken byte/s, I have plenty. Reporting would be good to, see if it's
> still random and I have to look at some other part of my system.
No. But if I would like something to basically halt my machine on
suspicion of broken RAM, I'd like it to be something I install all by self
and not something that is forced upon me by Systemd policy. Luckily I have
Devuan.
I am not sure whether there is some kind of broken memory handling daemon
available already. However I strongly suggest finding out which RAM bars
are affected and replace them. Not sure how to do that, but I'd start with
memtest86+ which recently became available with UEFI support, in case you
use UEFI.