:: Re: [DNG] Max Load Average
Top Page
Delete this message
Reply to this message
Author: Martin Steigerwald
Date:  
To: dng
Subject: Re: [DNG] Max Load Average
Hi Bruce, hi.

Bruce Perens - 18.07.24, 18:34:20 CEST:
> Newer Intel systems will indeed shut down during heavy use if you don't
> run thermald. I don't know about AMD.


Really?

Interesting. Thanks for letting us know. It is important!

Do you speak from practical experience or have any sources on that?

I do not really agree with this requirement. Crucial thermal regulation
should never ever depend on user space components by default. That is at
least my take.

So far my take was that all the necessary frequency and idle state
regulation was taking place in the kernel and/or firmware by default.

Any user space components would be in more or less risky addition or
(partly) replacement of that.

Especially as for something like thinkfan or zcfan you need to specifically
override the kernel restriction that user space components are not even
allowed to touch the fan.

Now thermald may not touch the fan control and instead only uses frequency
reduction, idle states and other power saving features of the CPU. Seems
like it from the description of what it does¹.

Anyway thanks for the correction: If on Intel you need thermald, then
better have it running!!

I do not have it running on any of my AMD based laptops. I also do not
have it running on older Intel based Thinkpad laptops or tablets – that
ThinkPad X1 Gen 1 tablet with Intel ULV processor that does not even have
a fan. thermald appears to be maintained by Intel. What that means for AMD
machines I do not know. But I never had any hardware shutdowns from just
CPU usage on any of the AMD based ThinkPads I use: T14 AMD Gen 1, 2 and 5.
In fact I never had any hardware shutdowns from CPU *and* GPU usage while
letting the kernel and firmware – or especially for the older models zcfan
– do its regulation work freely. Or from BTRFS scrubbing data on NVME SSDs
which is CPU and NVME usage and those NVME's can get quite hot. Not a
single hardware shutdown. Not even one.

On this ThinkPad T14 AMD Gen 5 with stress -c 100 in 26 degrees Celsius
with lots of humidity:

It is totally unconcerned. Absolutely unconcerned. So totally absolutely
unconcerned I don't even really have words for it.

It took the machine several minutes to even up the firmware (!) controlled
fan to more than 2000 RPM allowing the CPU to go to about 65 degrees
before kicking in. After that the firmware keeps the temperature at an easy
60 to 61 degrees Celsius with just about 3200-3300 RPM:

thinkpad-isa-0000
Adapter: ISA adapter
fan1:        3282 RPM
fan2:        3282 RPM
CPU:          +60.0°C 


The 16 logical cores of this 8 core processor AMD Ryzen 7 PRO 8840U with
AMD 780M graphics are doing at about 2,25 GHz. Sure they do not max out at
their maximum about 5,1 GHz automatic overclocking performance, but that
is to be expected. It can go that high only if one or at most a few cores
are utilized and maybe even then only for a limited amount of time.

Even after about – how long have I been typing on this mail so far? –
20-30 minutes of calculating silly square roots with stress this is so
totally unspectacular that I will do myself a favor and let it the laptop
go back to below 2000 RPM or even fan off again so that it is (almost)
completely silent again. But I am 100% sure that it easily would do this
for hours, days and even weeks. I know the fan can go up as high as a
whopping 6000 RPM. The machine is not even trying hard to keep the CPU at
a cool 60 to 61 degrees Celsius at the moment.

But don't be fooled: Once I put the GPU to lots of work, that picture may
differ. The fan would easily go to 3700 to 4000 RPM then. With the fan
being firmware controlled I so far never saw more. The user space fan
control zcfan – zero configuration fan – ups the fan later, and I have seen
it going to maximum fan speed of a bit above 6000 RPM which can be a bit
annoying sound-wise. I currently do not use it, cause I think the firmware
does the job well enough even better than zcfan. That is new to me. Cause
with all the earlier ThinkPads I had better results, i.e. very
consistently more silent laptops, with thinkfan or zcfan.

But even with CPU *and* GPU at work with firmware controlled fan and with
letting the kernel do its work, there should not be an emergency hardware
shutdown. On this newer laptop I am not completely sure with zcfan
controlled fan. With combined CPU *and* GPU *and* mabye NVME usage, zcfan
may not put the fan into maximum RPM quickly enough. If you switch off
automatic regulation and switch CPU and *especially* GPU to maximum
frequency, probably even adding some NVME usage to the mix: Well again,
you get the result you asked for. The integrated GPU in AMD air cooled
laptops usually is not meant to be driven at maximum frequency all of the
time. So do not force it! That is at least my take from practical
experience so far.

Okay, what gives… let it do a few minutes of stress -c 100 with zcfan
enabled:

fan1:        1590 RPM
fan2:        1590 RPM
CPU:          +65.0°C 


fan1:           0 RPM
fan2:           0 RPM
CPU:          +68.0°C


fan1:           0 RPM
fan2:           0 RPM
CPU:          +70.0°C


fan1:        1972 RPM
fan2:        1972 RPM
CPU:          +70.0°C


fan1:        1972 RPM
fan2:        1972 RPM
CPU:          +69.0°C


That appears to be it. Yeah that appears to be stable after about 5
minutes. So for CPU based workload zcfan is totally unconcerned either. It
certainly does not need to freak out and put the fan to maximum RPM like
it can happen with CPU *and* GPU based workload or while scrubbing 1,5 TiB
of data on NVME with BTRFS.

Not letting it run for another half an hour. I think it suffices as a
demonstration.

Totally unspectacular as well. CPU load is so totally a non issue on this
system…

Of course you can argue: that is only calculating square roots. Right. But
I do not expect it to become a lot more of an issue with other purely CPU
based loads.

Disabling zcfan again. Fan goes up and the machine manages to bring the
temperature down to about 60 degrees again within about 5minutes:

After about one minute:

fan1:        3841 RPM
fan2:        3841 RPM
CPU:          +64.0°C


After about 2 minutes or so:

fan1:        3843 RPM
fan2:        3843 RPM
CPU:          +63.0°C


After about 3-4 minutes:

fan1:        3841 RPM
fan2:        3841 RPM
CPU:          +61.0°C


Stress processes terminated. After about a minute:

fan1:        3264 RPM
fan2:        3264 RPM
CPU:          +51.0°C


And now:

fan1:        1984 RPM
fan2:        1984 RPM
CPU:          +48.0°C


I do not really hear the fan.

In comparison one could argue that the hardware can last longer without
zcfan cause the firmware keeps it cooler. But I have yet to see any of the
ThinkPads that have zcfan enabled to break. None of them did. Am I
recommending zcfan? No. If you want to play it safe, let the firmware do
its job. Only use zcfan if you are willing to accept the risk.

And again: All of this without any kinds of warranties and guarantees.

[1] https://github.com/intel/thermal_daemon

Thanks,
--
Martin