:: [DNG] Keeping services running: was…
Página Principal
Delete this message
Reply to this message
Autor: Steve Litt
Data: 2017-04-14 16:07 -000
Para: dng
Tópicos Antigos: Re: [DNG] tiny service state api [WAS: Fwd: init system agnosticism]
Assunto: [DNG] Keeping services running: was tiny service state api [WAS: Fwd: init system agnosticism]
On Fri, 14 Apr 2017 13:56:32 +0000
Daniel Abrecht <dng@???> wrote:

> Hi
> From my point of view, systemd always tries to keep services running,
> no matter how hard they fail, and to mask possible problems when
> starting a service, so the service maintainers don't have to fix
> their service, which is really unfortunate.

If you don't like that aspect of systemd, you're REALLY going to hate
runit, which always restarts crashed/ended daemons. I think sysvinit or
OpenRC would be more to your taste.

That being said, runit has the option of using a ./finish script, which
could report the malfunction and set a filesystem flag to prevent the
service being run again. But that's kinda kludgy.

By the way, I think systemd has the option of not rerunning.

> In case of those service state notifications with sd_notify, I think
> they are usually used to signal when a service is starting, but not
> ready yet. This may seam reasonable at the beginning, but I think it
> fixes the problem at the wrong place; When a service needs another
> service, but it's temporary unavailable, it should cause an error or
> warning to be returned and logged, but it should never be a fatal
> error which causes the service to stop.

When process dependencies rear their heads, I write my runit run
scripts something like the following, which tests for Internet
connectivity before running the my_kewl_daemon service:

if ping -c1 google.com > /dev/null; then
exec my_kewl_daemon arg1 arg2
sleep 1

If the ping fails, instead of starting the daemon, it waits a second
plus any time for runit's supervisor to cycle around, and then tries
again. Looking at the preceding, it might seem that with everybody
waiting a second to wait for everybody else, you might experience 5
minute gridlock startups. But in fact, for whatever reason, that
doesn't happen.

The beauty of the way I do it in runit is that I test the actual
performance of the dependent daemon, rather than having either the init
system or the daemon declare that the daemon is functional.


Steve Litt 
April 2017 featured book: Troubleshooting Techniques
     of the Successful Technologist