Author: devuan.kn Date: To: dng Subject: Re: [Dng] Why daemontools is so cool
Hi Steve,
On Mon, Mar 30, 2015 at 7:24 PM, Steve Litt -
slitt@???
<devuan.kn.3c178e07a2.slitt#troubleshooters.com@???> wrote: > Hi Didier,
>
> If your post says what I think it says, you're saying that modern init
> systems should always start services concurrently, not consecutively.
>
> Certainly that's a good thing, and we're working toward it, but it's
> important to keep some perspective on the matter and do a cost/benefit
> analysis on the alternatives.
>
> On my experimental Manjaro machine, systemd, which most would agree is
> very concurrent, booted in 4 seconds. Epoch, which has absolutely no
> concurrency at all and boots completely consecutively, booted in 8
> seconds. How much complexity, how much indeterminacy, are we willing to
> put up with to get A) 4 more seconds in our life every time we reboot,
> and B) do it the more "modern" way?
How fast is epoch when you throw it at a generic piece of hardware
that you did not hand-tune it for?
The problem there is that a consecutive boot system needs to probe for
hardware and give that hardware time to show up, blocking the boot
process in the meantime. If you have lots of hardware you need to
probe for (as e.g. Devuan would need to out-of-the-box), then those
wait-times can sum up really fast. And some hardware can take a very
long time to register, so you need to be generous with those
time-outs. On the other hand anybody without the hardware will stuck
for the entire timeout, so you need to keep them as short as possible.
A parallel boot system can just start all hardware probes at the same
time, be very generic with the timeouts and just continue as soon as
the necessary hardware showed up.
It is also impossible to *generically* support all possible of
combinations of real hardware, devicemapper, LVM, software raid, fsck
and cryptsetup in a consecutive boot setup. Debian had long-standing
bugs to that regard.
Note that you can of course configure anything with any system
manually, but any Linux distribution should strive to support as wide
a range of setups as possible out-of-the-box. Well, arch and gentoo
obviously do not need to, but they already fill their niches quite
nicely:-)
> And most of all, if you or your distro is careless with order on a
> completely consecutive boot, it could make all the difference in the
> world. I've had 5 minute boots, of which 3 minutes was, IIRC, NFS
> timing out instead of running instantly, because of no reverse DNS.
> Even today, if you put wicd-cli in your bootup, it takes 20 seconds or
> so to do the wifi negotiations. But note that all wifi-equipped systemd
> systems I've seen simply delay wifi-negotiation out of init and into X
> startup.
If your distro screws up, then you are screwed (until you fix it;-).
That is pretty much true independent of the topic.
I seriously doubt that systemd was pushing wifi-negotiation into X
startup though. With any non-concurrent system X may be up and waiting
for your login long before init is done with the initial setup of your
system. So it may appear like wifi negotiation happens only after X
login.
Or you might have been ended up using network-manager, which has a
tendency to only start the network after somebody logged in:-/
> Looking at the use cases in the preceding two paragraphs, I'd say that
> in all other cases I can think of, the 4 second plus modern-man
> feelgood benefit you get from concurrent startup during init doesn't
> begin to pay for the increased complexity and decreased determinacy of
> concurrent service startup.
The killer argument for parallel startup with dependency handling is
robustness, not speed.
Maybe it is my tendency to mess around with cryptsetup and co. that
gets me into trouble, but I did have unbootable systems with sysv-init
due to "unexpected setup" problems. Nothing I could not fix, but still
an annoyance that I would be happy to get rid of.
This is a statement about the concept of parallel init systems with
dependencies, not about any specific implementation.
> By the way, one excellent thing about the Epoch init system is that,
> because it's completely consecutive, you can get a close look at which
> services are taking too long to start, troubleshoot them to find the
> bottleneck, and fix them, so that they'll start efficiently in your
> concurrent init. The quicker everything starts in a concurrent init, the
> less chance for race conditions.
Yes and no.
What happens if your filesystem has a slow day (e.g. due to some f*ing
RAID controller deciding that it needs to do some extra sanity
checks)? That will lead to the consecutive boot system panicking since
its root device is not there (after some timeout), which in turn will
lead to some poor admin having to investigate and nudge the server to
try once more.
The whole consecutive boot thing hinges on timeouts and that is
neither generic nor robust. I admit that it is simple. And I also
think that epoch can make a great init system for a specific system,
but there are better choices for a distribution as a whole.