:: Re: [DNG] opinions and experience w…
Kezdőlap
Delete this message
Reply to this message
Szerző: Evilham
Dátum:  
Címzett: dng
Tárgy: Re: [DNG] opinions and experience with monit
Hello :),

Am 22/08/2017 um 17:54 schrieb Jaromil:
>
> dear DNG'ers
>
> on my quest to study more supervision programs for my own use, I've
> found out (just now!) about monit:
>
> https://mmonit.com/monit/
>
> I'm wondering if you have experiences using it and what are your
> opinions, it seems to me that is a well written, minimal enough
> addition to sysvlinux when more features are desirable but no
> entanglement is aloud.
>
> is there someone here who knows it / has already experience with it?


Remember I set up Monit alerts for most of Devuan's infra a few weeks
ago and they've been quite helpful :-).

IMO: Monit is *very good* at a few things, so it depends on what your
use case is.
Hopefully I explain properly what these things are:

If you are only managing one server, or only want alerts (i.e. not act,
only notify) about multiple servers, Monit is definitely a good way to
go; it's very easy to set up and does its job quite well.

If, on the other hand, you have multiple machines you would like to act
upon, you should really go with something like Nagios.

A couple tips that may make it easier:
- Monit's default message format is awful, do customise it :).
- It is possible to set up custom alerts for specific events (e.g.
That's what I do with Devuan related tests, the Monit instance does
plenty of checks besides those of Devuan).
- Networks are unreliable. If you are implementing network checks
against other hosts; you should make use of the "for X times within Y
cycles", that will ensure that network hiccups won't trigger an alert of
something that is not going to be an issue.
A downside of this is, you don't get that information in the
notification, e.g. if you have a cycle of 1 minute and say check "for 3
times within 5 cycles": your alert will say "failure at 10.11 am", but
it will imply that the service is probably down since 10.09 (3 failures
in 5 minutes).
- You *can* run arbitrary scripts when something happens (or to check
against the script's result / output). This gives you tons of
flexibility; if you do that for too many things though, you may be
better off checking Nagios :).

I hope that helps,
--
Evilham