:: Re: [DNG] Chimaera CPU stuck
Top Page
Delete this message
Reply to this message
Author: Andreas Messer
Date:  
To: Luciano Mannucci
CC: Dng
Subject: Re: [DNG] Chimaera CPU stuck
Hi Luciano,

Am Wed, Sep 14, 2022 at 07:24:07AM +0200 schrieb Luciano Mannucci:
> hello all!
>
> I have a virtual machine running under kvm who started hanging giving
> this message just before it dies:
>
> kernel:[ 296.013011] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
>
> This happens only on high i/o load.
> The other virtual machines are all running with no problems.
> What should I do?


The message actually means, that moving memory pages
to/from swap space took much longer than the kernel expects. This can
happen when another process is using the entire I/O bandwidth to the disk.
I had similar issues with my Desktop PC. It turned out this was
somehow related to the 32GB RAM of my machine. When a process writes
files, the kernel will cache the data first and executes the actual disk
writes later depending on cache fill and time. When a process produces
data very fast, the cache will grow more and more even while the kernel
is already writing data out to disk and at some point an internal
threshold in the kernel is hit. (/proc/sys/vm/dirty_ratio) At this time,
the kernel will block all processes writing to disks and flush the entire
cache content to the disk. If you have a lot of RAM, this flushing
can take a lot of time (seconds till minutes). Large RAM machines are
affected by this since the threshold is by default set as ratio from
f RAM memory.

I mitigated this by reconfigure the so called background write threshold

cat /etc/sysctl.d/tuning.conf
# The following settings are to avoid long application stalls when
# writing large files to disk. They lower the amount of write
# cached data in RAM until actual writing occurs. This will prevent
# the system from writing data in large chunks while everything
# else blocks. So this improves the latency of the desktop
# The values are by defaulted computed as fraction of the main memory
# which results in fairly large cached unwritten data on high memory
# systems

# Start background writing when more than 128MB data are in write cache
# This value is tuned regarding write performance of HDD ~ 100MB
vm.dirty_background_bytes=67108864
vm.dirty_bytes=268435456

Maybe this additional information is helpful:

https://forum.proxmox.com/threads/io-performance-tuning.15893/
https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/

Hope that helps,

cheers,
Andreas

--
gnuPG keyid: 8C2BAF51
fingerprint: 28EE 8438 E688 D992 3661 C753 90B3 BAAA 8C2B AF51