:: Re: [Dng] Why daemontools is so coo…
Top Page
Delete this message
Reply to this message
Author: Steve Litt
Date:  
To: dng
Subject: Re: [Dng] Why daemontools is so cool
On Mon, 30 Mar 2015 11:44:29 +0200
Didier Kryn <kryn@???> wrote:

>
> Le 29/03/2015 11:15, marc a écrit :
> >> No need to mix doubleforking and PID tracking on your
> >> program. That should be the duty of whatever daemonizes and manages
> >> your program. You know, like Daemontools or s6.
> > So there is a very good reason for a deamon to handle its
> > own backgrounding: The sensible convention is it that it
> > should only background at the instant where it is ready to
> > service requests: If there is a long initialisation phase
> > it should stay in the foreground - so that things that
> > depend on it in turn do not get started too soon. A more
> > detailed description of this problem I wrote up a while ago
> > at welz.org.za/notes/on-starting-daemons.html.
> >
> > More fundamentally: If an application has problems calling the
> > a daemonize() or fork_parent() function or the handful of system
> > calls that make up this, then maybe this a limitation of the
> > development environment or language - if calling these this is
> > regarded to be hard then one wonders how reliable the rest of the
> > program is.
>      Dear Marc,

>
>      This makes sense if we consider the very old way a Linux system
> was run:
>          boot; then mount filesystems, them start syslog; then 
> initialize network; then start nfs and ssh ...
>      then stop daemons in the reverse order, unmount filesystems and 
> shutdown.

>
>      A more modern way of starting daemons is to have fine-grained 
> dependencies. Instead of waiting until syslogd is started, why not
> wait until the socket /var/log is created, or create it even before
> starting syslogd.

>
>      And furthermore, is it really necessary to wait for anything
> before starting the daemons; why wait until the network is configured
> to start ssh and nfs?

>
>      Things are not going the linear way anymore. The network cable,
> as an example, can be disconnected and reconnected, and the network 
> interface de-configured and re-configured, and the ssh daemon will 
> survive, and NFS as well. Even you can reboot an nfs server and
> clients having their rootfs nfs-mounted come back to life seemlessly.

>
>      Daemons should be prepared to wait until the needed ressource is 
> available; they should even be prepared to see the ressource they
> need disapear and to wait until it shows up again.


Hi Didier,

If your post says what I think it says, you're saying that modern init
systems should always start services concurrently, not consecutively.

Certainly that's a good thing, and we're working toward it, but it's
important to keep some perspective on the matter and do a cost/benefit
analysis on the alternatives.

On my experimental Manjaro machine, systemd, which most would agree is
very concurrent, booted in 4 seconds. Epoch, which has absolutely no
concurrency at all and boots completely consecutively, booted in 8
seconds. How much complexity, how much indeterminacy, are we willing to
put up with to get A) 4 more seconds in our life every time we reboot,
and B) do it the more "modern" way?

The preceding question has no one "right" answer. As has been pointed
out in many pro-systemd assertions, if you're contracted to provide
0.999999 uptime, that four seconds means everything. If there's some
reason you need to reboot six times a day, the four second difference
could become meaningful. If you're starting 40 services, it could be a
difference between 1 minute and 2 minutes, which might be somewhat
meaningful if you shut down every night and boot up every morning.

And most of all, if you or your distro is careless with order on a
completely consecutive boot, it could make all the difference in the
world. I've had 5 minute boots, of which 3 minutes was, IIRC, NFS
timing out instead of running instantly, because of no reverse DNS.
Even today, if you put wicd-cli in your bootup, it takes 20 seconds or
so to do the wifi negotiations. But note that all wifi-equipped systemd
systems I've seen simply delay wifi-negotiation out of init and into X
startup.

Looking at the use cases in the preceding two paragraphs, I'd say that
in all other cases I can think of, the 4 second plus modern-man
feelgood benefit you get from concurrent startup during init doesn't
begin to pay for the increased complexity and decreased determinacy of
concurrent service startup.

By the way, one excellent thing about the Epoch init system is that,
because it's completely consecutive, you can get a close look at which
services are taking too long to start, troubleshoot them to find the
bottleneck, and fix them, so that they'll start efficiently in your
concurrent init. The quicker everything starts in a concurrent init, the
less chance for race conditions.


SteveT

Steve Litt                *  http://www.troubleshooters.com/
Troubleshooting Training  *  Human Performance