:: [devuan-dev] bug#227: nbd-client: A…
Góra strony
Delete this message
Reply to this message
Autor: Jesse Smith
Data:  
Dla: Mark Hindley, David Kuehling, 227
Temat: [devuan-dev] bug#227: nbd-client: At shutdown nbd-client disabled before file-systems could be cleanly unmounted
On 2023-02-16 11:22 a.m., Mark Hindley wrote:
> Control: tags -1 upstream
>
> David,
>
> I know it has been a long time, but thanks for this.
>
> On Mon, Jul 16, 2018 at 01:52:51AM +0200, David Kuehling wrote:
>> Short summary of the problem: during shutdown /etc/init.d/sendsigs calls
>> killall5 binary from sysvinit-utils, killing almost all running
>> processes.
>>
>> Of course it never should kill nbd-client, so the /etc/init.d/nbd-client
>> script is smart enough to register its PID to be exempt from sendsig's
>> action: by recording it in the /run/sendsigs.omit.d/nbd-cient file.
>>
>> These PIDs are then collected by /etc/init.d/sendsig and passed as "-o
>> NNN" options to killall5 which spares those processes from premature
>> termination.
>>
>> However, before killall5 goes on to kill all the other processes, it
>> does a:
>>
>>     /* Now stop all processes. */
>>     kill(-1, SIGSTOP);

>>
>> And when it's done, it does:
>>
>>     /* And let them continue. */
>>     kill(-1, SIGCONT);

>>
>> These SIGSTOP, SIGCONT signals are passed to all processes, including
>> nbd-client. Unfortunately nbd-client is written in a way that makes it
>> unable to handle any signals delivered while it is inside an ioctl call,
>> and it looses its server connection on SIGSTOP, totally breaking the
>> block devices it provides.
>>
>> What would be the right way to prevent this problem? Fix sysvinit?
> My inclination here is that killall5 shouldn't send any signals (including a
> STOP CONT pair) to processes that have registered to be omitted.
>
> Copying Jesse Smith the sysvinit maintainer.
>
> Jesse what do you think? Is it possible to avoid that?
>
>



Thanks for the detailed bug report and explanation.

The way I see it this probably is a bug in the way killall5 operates. If
we've been explicitly told not to send signals to a process, then we
shouldn't send any signals to that process. It shouldn't matter if the
signal is STOP, CONT, or TERM.

I see two possible ways we could fix this:

1. Create a command line flag which disables the SIGSTOP and SIGCONT
signals being sent. This is an easy fix, quick and dirty. The potential
downside is if someone disables the STOP signal then maybe processes
terminate, move groups, or are replaced before we get around to sending
them the KILL signal. This probably won't happen, but it means killall5
is working with a "moving target".

2. We can run SIGSTOP on all processes _except_ those in the omit list.
This will be a lot slower than the existing "kill(-1, SIGSTOP)" call we
currently make. But I think it's more correct.

Basically the new work flow would look like this:

1. Send all processes except those omitted the SIGSTOP command.
2. Send all processes except those omitted the SIGKILL command.
3. Send all processes except those omitted the SIGCONT command.

Option #2 is slow and ugly, but seems "correct" from a behaviour point
of view.

I'm open to comments before I patch this.

- Jesse