:: Re: [Dng] Why daemontools is so coo…
Top Page
Delete this message
Reply to this message
Author: Jude Nelson
Date:  
To: devuan.kn
CC: dng@lists.dyne.org
Subject: Re: [Dng] Why daemontools is so cool
> The problem there is that a consecutive boot system needs to probe for
> hardware and give that hardware time to show up, blocking the boot
> process in the meantime.


The only hardware detection during boot that *needs* to block is mounting
the root device. Boot cannot proceed without that. To do so, the kernel
needs to have a driver for the device--either loaded into it by the
initramfs (the loading and execution of which takes time), or compiled in
directly.

The remaining driver-loading does not run as part of the boot sequence.
Only the modules needed to bring the system up get loaded (i.e. the modules
in /etc/modprobe.d). Once root and the other filesystems are mounted, you
should be able to start programs up in whatever order regardless of the
state of the hardware. Individual programs like ntpd should be smart
enough to wait until the required hardware is available and configured,
such as network interfaces (either on their own, or with the help of a
supervisor).

If your boot sequence is taking too long because it's loading unnecessary
drivers, then your boot sequence is misconfigured.

> And some hardware can take a very
> long time to register, so you need to be generous with those
> time-outs. On the other hand anybody without the hardware will stuck
> for the entire timeout, so you need to keep them as short as possible.


If your hardware is taking a long time to spin up, it's due to bug in
either the hardware or its driver, not the boot sequence.

Again, if you're talking about waiting for the device with your root
filesystem, the boot process is *supposed* to block until it's ready. Boot
cannot proceed until the root device is found and mounted; otherwise you
obviously cannot load programs. This process cannot be sped up through
parallelization (Amdahl's Law and all that). Same goes for programs that
cannot be started until /usr is mounted (if you have a separate /usr).

> A parallel boot system can just start all hardware probes at the same
> time, be very generic with the timeouts and just continue as soon as
> the necessary hardware showed up.


Hardware detection happens in the kernel, and the kernel does detect
hardware in parallel. That's one of the reasons why you get to deal with
disk and interface names changing between boots.

"Just continuing as soon as the necessary hardware shows up" is already
what happens, like I said above. Also, it's useless to run more instances
of insmod(8) than you have CPU cores. It can be dangerous to run even two
insmod(8)'s in parallel, since last I checked [1], setting up a kernel
module is not an atomic operation. Run multiple insmod(8)'s at your own
risk--you may expose race conditions that crash your system or render your
hardware unusable without a reboot.

> The killer argument for parallel startup with dependency handling is
> robustness, not speed.


No, the opposite is true. Programs with multiple instances of execution
(processes, threads, coroutines) in practice tend to be much more
error-prone, because they are much harder to reason about. This is because
the number of states such a program can be in increases with the
*factorial* of the number of instances of execution it has. This is such a
problem that determinism is often a design requirement for mission-critical
software whose failure will result in huge costs and/or loss of life.

> Maybe it is my tendency to mess around with cryptsetup and co. that
> gets me into trouble, but I did have unbootable systems with sysv-init
> due to "unexpected setup" problems. Nothing I could not fix, but still
> an annoyance that I would be happy to get rid of.


Parallel boot won't fix misconfigurations you introduced by messing around
with it.

> The whole consecutive boot thing hinges on timeouts and that is
> neither generic nor robust.


The boot sequence does *not* hinge on timeouts. If anything, timeouts are
a fallback mechanism for working around other programs not making forward
progress (i.e. due to bugs, a down network, or faulty hardware). If your
boot sequence is encountering timeouts, then something's wrong with your
boot sequence.

-Jude

[1] http://lkml.iu.edu/hypermail/linux/kernel/1502.2/01852.html

On Mon, Mar 30, 2015 at 2:54 PM, <devuan.kn@???> wrote:

> Hi Steve,
>
> On Mon, Mar 30, 2015 at 7:24 PM, Steve Litt -
> slitt@???
> <devuan.kn.3c178e07a2.slitt#troubleshooters.com@???> wrote:
> > Hi Didier,
> >
> > If your post says what I think it says, you're saying that modern init
> > systems should always start services concurrently, not consecutively.
> >
> > Certainly that's a good thing, and we're working toward it, but it's
> > important to keep some perspective on the matter and do a cost/benefit
> > analysis on the alternatives.
> >
> > On my experimental Manjaro machine, systemd, which most would agree is
> > very concurrent, booted in 4 seconds. Epoch, which has absolutely no
> > concurrency at all and boots completely consecutively, booted in 8
> > seconds. How much complexity, how much indeterminacy, are we willing to
> > put up with to get A) 4 more seconds in our life every time we reboot,
> > and B) do it the more "modern" way?
>
> How fast is epoch when you throw it at a generic piece of hardware
> that you did not hand-tune it for?
>
> The problem there is that a consecutive boot system needs to probe for
> hardware and give that hardware time to show up, blocking the boot
> process in the meantime. If you have lots of hardware you need to
> probe for (as e.g. Devuan would need to out-of-the-box), then those
> wait-times can sum up really fast. And some hardware can take a very
> long time to register, so you need to be generous with those
> time-outs. On the other hand anybody without the hardware will stuck
> for the entire timeout, so you need to keep them as short as possible.
>
> A parallel boot system can just start all hardware probes at the same
> time, be very generic with the timeouts and just continue as soon as
> the necessary hardware showed up.
>
> It is also impossible to *generically* support all possible of
> combinations of real hardware, devicemapper, LVM, software raid, fsck
> and cryptsetup in a consecutive boot setup. Debian had long-standing
> bugs to that regard.
>
> Note that you can of course configure anything with any system
> manually, but any Linux distribution should strive to support as wide
> a range of setups as possible out-of-the-box. Well, arch and gentoo
> obviously do not need to, but they already fill their niches quite
> nicely:-)
>
> > And most of all, if you or your distro is careless with order on a
> > completely consecutive boot, it could make all the difference in the
> > world. I've had 5 minute boots, of which 3 minutes was, IIRC, NFS
> > timing out instead of running instantly, because of no reverse DNS.
> > Even today, if you put wicd-cli in your bootup, it takes 20 seconds or
> > so to do the wifi negotiations. But note that all wifi-equipped systemd
> > systems I've seen simply delay wifi-negotiation out of init and into X
> > startup.
>
> If your distro screws up, then you are screwed (until you fix it;-).
> That is pretty much true independent of the topic.
>
> I seriously doubt that systemd was pushing wifi-negotiation into X
> startup though. With any non-concurrent system X may be up and waiting
> for your login long before init is done with the initial setup of your
> system. So it may appear like wifi negotiation happens only after X
> login.
>
> Or you might have been ended up using network-manager, which has a
> tendency to only start the network after somebody logged in:-/
>
> > Looking at the use cases in the preceding two paragraphs, I'd say that
> > in all other cases I can think of, the 4 second plus modern-man
> > feelgood benefit you get from concurrent startup during init doesn't
> > begin to pay for the increased complexity and decreased determinacy of
> > concurrent service startup.
>
> The killer argument for parallel startup with dependency handling is
> robustness, not speed.
>
> Maybe it is my tendency to mess around with cryptsetup and co. that
> gets me into trouble, but I did have unbootable systems with sysv-init
> due to "unexpected setup" problems. Nothing I could not fix, but still
> an annoyance that I would be happy to get rid of.
>
> This is a statement about the concept of parallel init systems with
> dependencies, not about any specific implementation.
>
> > By the way, one excellent thing about the Epoch init system is that,
> > because it's completely consecutive, you can get a close look at which
> > services are taking too long to start, troubleshoot them to find the
> > bottleneck, and fix them, so that they'll start efficiently in your
> > concurrent init. The quicker everything starts in a concurrent init, the
> > less chance for race conditions.
>
> Yes and no.
>
> What happens if your filesystem has a slow day (e.g. due to some f*ing
> RAID controller deciding that it needs to do some extra sanity
> checks)? That will lead to the consecutive boot system panicking since
> its root device is not there (after some timeout), which in turn will
> lead to some poor admin having to investigate and nudge the server to
> try once more.
>
> The whole consecutive boot thing hinges on timeouts and that is
> neither generic nor robust. I admit that it is simple. And I also
> think that epoch can make a great init system for a specific system,
> but there are better choices for a distribution as a whole.
>
> BR
> Karl
>
> _______________________________________________
> Dng mailing list
> Dng@???
> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
>