:: Re: [DNG] Supervision scripts (was …
Forside
Slet denne besked
Besvar denne besked
Skribent: Arnt Gulbrandsen
Dato:  
Til: dng
Emne: Re: [DNG] Supervision scripts (was Re: OpenRC and Devuan)
Stephanie Daugherty writes:
> Service failures should be extraordinary events, and we should
> strive to keep treating them as such, so that we continue to
> pursue stability. Restarting a service automatically doesn't
> improve stability of that software, it works around an
> instability rather than addressing the root cause - it's a
> band-aid over a festering wound.


Unix has a few design choices that tend to produce problems like these,
such as malloc() and its c++ cousin "operator new".

Malloc() is very simple: You ask for memory and get it. The negative side
of that simplicity is that if you're out of memory (and that happens
occasionally if a server is run close to capacity) then processes die
and/or become unresponsive. Such is the tyranny of the Poisson
distribution.

> The failure of a service is analogous in my eyes to the
> tripping of a circuit breaker - it happened for a reason, and
> that underlying reason is probably serious.


Pick your poison: Restart services or add failure handling around all
malloc() calls. I quite like the former in many cases, even though it
papers over various unintentional problem as well as provide the
intentional simplification. But then I like TCP better than NCP, etc.

Arnt