:: Re: [DNG] random sudden stops
Top Page
Delete this message
Reply to this message
Author: Brad Campbell
Date:  
To: dng
Subject: Re: [DNG] random sudden stops
On 26/8/21 8:10 am, Hendrik Boom wrote:
> For the past few months my home server (running an ascii installation
> physically moved from another computer) has been suddenly stopping all
> processing about once a month. apparently at random. It seems to stop
> instantly, leaving power on and becoming completely responsive to ping,
> existing ssh connexions and use of the physical keyboard.
>
> The system log, after a reboot, shows nothing unusual except of course
> that there are no log entries for a shut-down.
>
> Can anyone provide ideas about tracking this down?
>
> It could of course be a random rare intermittent hardware error.


Sounds like the perfect application for netconsole.

I have a raspberry pi that runs some stuff, on that I installed udplogger : https://lwn.net/Articles/571589/
Run with : /usr/local/bin/udplogger port=6666 dir=/root/udplogs/

I have a number of machines set up with netconsole on the command line, or loaded after boot. There are easier ways to do this, but for whatever reason this is what I use (I honestly don't recall) :

    DEST=192.168.24.218
    mount none -t configfs /sys/kernel/config
    mkdir /sys/kernel/config/netconsole/target1
    pushd /sys/kernel/config/netconsole/target1
    echo 192.168.24.1 > local_ip
    echo $DEST > remote_ip
    echo br0 > dev_name
    arping -c1 $DEST | grep -o ..:..:..:..:..:.. > remote_mac
    echo 1 > enabled
    popd


Or on the kernel command line :
netconsole=6666@192.168.24.187/eth0,6666@192.168.42.218/ab:cd:ef:12:34:56

That way I pretty much always get the oops that never makes it to disk.

2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113147] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: radeon_dp_needs_link_train+0x69/0x70 [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113163] CPU: 4 PID: 4109 Comm: kworker/4:1 Not tainted 5.12.10+ #11
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113170] Hardware name: Apple Inc. iMac12,2/Mac-XXXXXXXXXXXXXX, BIOS 87.0.0.0.0 06/14/2019
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113174] Workqueue: events radeon_dp_work_func [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113229] Call Trace:
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113232] dump_stack+0x64/0x7c
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113237] panic+0xf6/0x280
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113241] ? radeon_dp_needs_link_train+0x69/0x70 [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113267] __stack_chk_fail+0x10/0x10
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113271] radeon_dp_needs_link_train+0x69/0x70 [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113297] radeon_connector_hotplug+0xa8/0xe0 [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113315] radeon_dp_work_func+0x28/0x40 [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113335] process_one_work+0x1c4/0x310
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113339] worker_thread+0x240/0x3c0
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113341] ? wq_update_unbound_numa+0x10/0x10
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113344] kthread+0x10a/0x120
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113346] ? kthread_park+0x80/0x80
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113348] ret_from_fork+0x1f/0x30
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113391] Kernel Offset: disabled
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113393] Rebooting in 10 seconds..
2021-07-09 11:19:24 192.168.24.187:6666 [1076334.114131] ACPI MEMORY or I/O RESET_REG.

Regards,
Brad