Author: Arnt Gulbrandsen Date: To: dng Subject: Re: [DNG] Detailed technical treatise of systemd
Steve Litt writes: > I'd like to discuss this. Now, after a year of thought, I still see no
> benefit to "starting servers in parallel" except for boot time.
Because you're thinking of the happy path.
Suppose you have a few dozen servers on three continents, providing a
user-facing service, using something like zk or etcd to coordinate the
servers.
Suppose further that something on the servers does five DNS lookups at
startup. On the happy path that takes 5*0.008=0.04 seconds and who cares,
but the worst case is in minutes. Say five 90-second timeouts. If things
start up serially, zk or etcd will begin to initialise about eight minutes
after the server started booting. The cluster can be without a quorum for
eight minutes, and if you're lucky that's just a horrible backlog of failed
or blocking transactions. If you're unlucky the node has been declared
unhealthy and the cluster has started copying terabytes of data in order to
restore redundancy.
For want of an X, Y. In real life ;)
BTW, systemd's approach to parallelism isn't particularly good for this
sort of service. Parallelism is good, but not just any kind. Systemd thinks
it can start services according to a DAG, but in reality that DAG is not
knowable on any single host. For example: Service X on nodes 1-A8 needs
service Y, which runs on nodes 3-5 and 12-15 today. The only sensible
approach is to start everything and require that all services behave
robustly when a dependency isn't ready.