:: Re: [DNG] Max Load Average
Top Page
Delete this message
Reply to this message
Author: Simon Hobson
Date:  
To: Devuan ML
Subject: Re: [DNG] Max Load Average
nisp1953 via Dng <dng@???> wrote:

> Another question here. What is the max load average I can run on my
> laptop? I am using Devuan 5.0 on a Lenovo T480S Thinkpad.


Coming late to this …

As others have said, there is no “correct” answer.

I went for a look, and via StackExchange and ServerFault items came to this blog https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html which I think is a really interesting discussion on what it means - and it’s history.

As already said, it’s an attempt at a general “how heavily loaded” is this machine. Where it is most useful is to monitor it over time and see what YOUR workloads tend to do. The actual value is relatively meaningless, you are looking for changes - e.g. if it rarely reaches 1 and suddenly jumps to “some number significantly larger than 1” then you know there is something different. It could just be that something is running that you don’t normally run, or it could be that there’s a problem and users are getting frustrated with a poorly responding system*.

But everything is a case of “relative to what is normal FOR YOU”. And what is acceptable in terms of response times is also a matter of “what’s OK for you” - e.g. adding a few seconds response delay on a mail server probably won’t register, on an interactive terminal it will drive the users mad.


As to what is safe, in general you should be able to load up a processor fully and not suffer problems. Of course, if the design is ... a bit marginal, then there may be issues. Or there may be a fault. I had a laptop (Apple MacBook) some time ago which was dual-core (I think 8 threads) but couldn’t run two cores at high load without shutting down for thermal protection. There was a known problem with that model where there could be inadequate contact between processor and heatsink - the cure being to take it apart, clean the faces, and reassemble with the right thermal compound/pad. Apple actually did that for me at one of their genius bars even though it was out of warranty :-) Before that, if I was going to do a CPU intensive task, I’d use a utility that was part of the developer tools to disable one core - running one core flat out was safe, running two at a high load was guaranteed to cause a shutdown which was “annoying”.


Simon


* I once admined a SCO OpenServer system where the application software we ran had an “interesting” built in reporting tool. As is common, you’d tend to build up a report in steps, adding the various tables you needed - it could run just fine, then a seemingly insignificant change could make it really slow.
Some will already know this, but OpenServer had memory limitations. You had to manually configure how much memory to allocate as disk cache (unless the rather low auto-configured value was sufficient), and this was hard limited to some low value (450000k http://osr507doc.xinuos.com/en/PERFORM/kernel_configure.html#tunedisks) - I did once try setting it to 1k higher, and the system then wouldn’t boot. Now our application kept two main files for sales orders - one for the orders, one for the items within each order, and the latter would grow to around 1G during a year. Some will be ahead of me on this now ;-)
Remember this was in the days of spinning disks, SCSI, and even a RAID array with disks spread across multiple U320 SCSI busses was slow by today’s standards.
So said reporting tool would run reports pulling in data from this line items table just fine - as long as it used the indexes. Make one tiny change and it would drop using indexes altogether (at least, that was my diagnosis). SO suddenly you are doing a full join between an order header file of around 100M, the order detail file of around 1G, and other tables, with a little over 1/2G of disk cache. That’s slow.

Of course, for good measure, when this happened, all other data got flushed from the cache, meaning that whatever users did, they got added to an already long disk access queue and from the user PoV the system was frozen, literally press key and nothing happens for perhaps minutes ! Needless to say, the phones got fairly busy very quickly.

In this case, looking at the various stats would show the CPU utilisation as 0 or 1%, WIO as 99 or 100%, and TBH I forget how the load average worked on OpenServer - if it was as Linux does (as described in the above blog), then it would rocket; otherwise it would probably go towards 0.

That report became one that was only allowed to be set off last thing on a Friday, and it took around 40 hours to run ! At some point, I re-wrote it in Informix (the underlying database was C-ISAM files), taking care over use of indexes, and got it down to ... 90 seconds without impacting any users :-)

Of course, had this been Linux, we could have just stuffed a few G of RAM in the system and it would have held the entire database in RAM !