:: Re: [DNG] OpenRC: was s6-rc, a s6-…
Forside
Slet denne besked
Besvar denne besked
Skribent: Laurent Bercot
Dato:  
Til: dng
Emne: Re: [DNG] OpenRC: was s6-rc, a s6-based service manager for Unix systems
On 29/09/2015 17:34, Timo Buhrmester wrote:
>> It can't respawn
> Probably because people don't want this behavior. Auto-respawn only
> makes sense when you're "relying" on buggy software you already expect
> to blow up, *and* are unwilling to debug it. "Try turning it off
> and on again", "A restart will fix it" is the Windows-way...


That's a common mistake, but a mistake nonetheless. In an ideal world,
process supervision may not be necessary, but we don't live in an
ideal world. Software crashes happen. Even software without bugs can
hit a temporary failure (out of memory, for instance) and exit; the
conditions can then change, but without supervision, your process is
dead until manual intervention.
Process supervision also provides the admin with better tools to
manage processes, for instance the ability to reliably send signals
to them without .pid files.

Process supervision is *not*, and should not be, a crutch to help
buggy software run. Pretending that it is its goal is a straw man
argument.


> In all other cases (I can think of), respawning a crashed service
> is exactly *not* what I want to happen (it could have crashed because
> it was exploited, providing the attacker with unlimited attempts).


A service being respawned does not preclude the system from sending
an alert when it crashes. Critical services *should* be monitored by
an alert system.
Also, as Simon says (pun unintended): if an attacker can crash the
service, what is better: that the attacker can trivially DoS your
service with one attack, or that he has to try again and again in
order to DoS you?


> Or it could have crashed because there's an environmental problem
> that isn't directly under the program's control, in which case
> restarting it would just be pointless, because it likely can't start
> at all.


You don't know that in advance. Some failures will be permanent, in
which case you'll most likely notice them as soon as you start the
service for the first time and can address the problem; other failures
are temporary, and that's where process supervision is a good thing to
have.


> Bonus points if the logs of the initial problem get rotated away due to
> excessive retrying, or the core dump of the initial crash gets
> overwritten...


If your admins did not prepare for this and write correct scripts
to save the core dumps to a safe place, or save crash logs to a place
where they won't be rotated away, this is a problem with your admins,
not with process supervision.

Ultimately, process supervision is a tool, and a good tool. It should
be a decision for the sysadmin to use it or not to use it; the decision
should not be enforced by the rc system. As Steve says, it is an
oversight of OpenRC to not provide the *possibility* of process
supervision.

--
Laurent