Am 18/07/2017 um 16:35 schrieb KatolaZ: > On Tue, Jul 18, 2017 at 04:08:37PM +0200, Evilham wrote:
>
> [cut]
>
>>
>> So, I just added the delay. That means: if a service is down but back up
>> again in less than 2 minutes, there won't be a notification.
>>
>
> You might also consider to set a number of failures to be accumulated
> before an actual alarm is raised. I mean, you might want a failure to
> become an alarm only if you have tried three times and you didn't have
> any success.
>
> But again, this way you will be re-inventing nagios ;)
AFAICK, this already happens:
Monit has a concept of "cycles" (mine are 30 seconds), so the checks run
once each cycle (can also be customised). If a check succeeds, it resets
a counter, but if it fails, the counter gets raised and if it matches
the alarm limit (4 cycles --> 2 minutes), it raises an alarm.
I can also trivially change the criterion to be "if failed X times
within Y cycles"; actually, I just did, it makes sense (otherwise 0 0 0
1 0 0 0 1 0 0 0 1 would never be raised and is obviously an issue).
Now if a site is down for 3 out of 5 cycles (90 seconds in 150 seconds),
an alarm is raised (email is sent) and there will only be another email
when the problem is gone (site up for 3+ out of 5 cycles).
Monit is related to Nagios (both are powerful monitoring tools), but
they are quite different :). For single-host or simple notify-only
checks, I like Monit better because the setup and config is stupid quick.
For multi-host and custom script checks, I understand that Nagios is
better; it's worth looking into it since Devuan's infrastructure is not
trivial.
--
Evilham