:: [DNG] Keeping services running: was…
Top Page
Delete this message
Reply to this message
Author: Steve Litt
Date:  
To: dng
Old-Topics: Re: [DNG] tiny service state api [WAS: Fwd: init system agnosticism]
Subject: [DNG] Keeping services running: was tiny service state api [WAS: Fwd: init system agnosticism]
On Fri, 14 Apr 2017 13:56:32 +0000
Daniel Abrecht <dng@???> wrote:

> Hi
>
> From my point of view, systemd always tries to keep services running,
> no matter how hard they fail, and to mask possible problems when
> starting a service, so the service maintainers don't have to fix
> their service, which is really unfortunate.


If you don't like that aspect of systemd, you're REALLY going to hate
runit, which always restarts crashed/ended daemons. I think sysvinit or
OpenRC would be more to your taste.

That being said, runit has the option of using a ./finish script, which
could report the malfunction and set a filesystem flag to prevent the
service being run again. But that's kinda kludgy.

By the way, I think systemd has the option of not rerunning.

>
> In case of those service state notifications with sd_notify, I think
> they are usually used to signal when a service is starting, but not
> ready yet. This may seam reasonable at the beginning, but I think it
> fixes the problem at the wrong place; When a service needs another
> service, but it's temporary unavailable, it should cause an error or
> warning to be returned and logged, but it should never be a fatal
> error which causes the service to stop.


When process dependencies rear their heads, I write my runit run
scripts something like the following, which tests for Internet
connectivity before running the my_kewl_daemon service:

#!/bin/sh
if ping -c1 google.com > /dev/null; then
exec my_kewl_daemon arg1 arg2
fi
sleep 1

If the ping fails, instead of starting the daemon, it waits a second
plus any time for runit's supervisor to cycle around, and then tries
again. Looking at the preceding, it might seem that with everybody
waiting a second to wait for everybody else, you might experience 5
minute gridlock startups. But in fact, for whatever reason, that
doesn't happen.

The beauty of the way I do it in runit is that I test the actual
performance of the dependent daemon, rather than having either the init
system or the daemon declare that the daemon is functional.

SteveT

Steve Litt 
April 2017 featured book: Troubleshooting Techniques
     of the Successful Technologist
http://www.troubleshooters.com/techniques