[Dng] vdev status update (May 25 2015)

Autor: Jude Nelson
Fecha:
A: dng@lists.dyne.org
Asunto: [Dng] vdev status update (May 25 2015)

Hey everyone,

I have the latest news for vdev:

* Added support for /dev/snd/by-id symlinks (thanks Jack L. Frost aka
openfbt!)

* Merged support for static linking vdev, building a libvdev.a archive
file, and building vdev with non-GNU libc (thanks Didier Kryn!)

* Added support for optionally running device helper scripts, even if the
device still exists. This is because sometimes other programs can create
device nodes independently of vdev, but not set them up appropriately. By
default, vdev will assume that a device is already set up if its device
file(s) exists already, and will not run any of its helper scripts. While
this means that scripts do not need to be idempotent by design (i.e.
they're run at most once), it also means that any program that creates a
device file is expected to set it up as well. This is inappropriate, since
managing device policies (i.e. permissions and persistent symlinks) is the
job of the device manager, not tools that happen to create device files.

To address this, vdev action files now support an "if_exists" directive,
that can be set to:
-- "run", as in, run the action's commands even if the device exists;
-- "mask", as in, don't run the action's commands, but don't report an
error that the device exists
-- "error", as in, stop running actions for this device (even if there are
still some to be processed), and log an error.

The absence of an "if_exists" directive causes vdev to try to run all
action commands for a device if the device does not exist, but report an
error if it does exist.

Thanks Scooby for the suggestion!

* [WIP] I'm making progress on libudev-compat, an ABI-compatible libudev
replacement that does not depend on udevd (or netlink or systemd or kdbus).

A bit of background: libudev defines an opaque "struct udev_monitor" that
contains a client-accessible pollable file descriptor, an API for defining
device event filters, and a method ("udev_monitor_receive_device") for
receiving "struct udev_device" instances generated from hotplug events.
Clients get the file descriptor and poll() on it, and then receive "struct
udev_device" instances from the associated monitor. The way it works under
the hood today is the pollable file descriptor is a non-blocking raw
netlink socket that listens to udevd's multicast group (btw, this is one of
the reasons why udev can't run in a container); the filtering API
implements code to build and attach a BPF filter to the socket that
implements the client's filter preferences; and
"udev_monitor_receive_device" receives the next serialized "struct
udev_device" message from udevd's multicast group and returns it to the
client.

What I'm working on going is the following:
* fork runfs to create eventfs, a RAM-backed userspace filesystem that
looks and smells like tmpfs, but is designed such that (1) files and
directories share fate with the process that creates them, and (2) the
filesystem remembers the order in which files in a directory are created.
We'd use it to implement reliable one-writer many-reader uevent packet
multicast. Specifically, the eventfs would work like a tmpfs, but with the
following different behaviors:
-- a directory and its children only exist if the process that created it
is still running. Once the process dies, the directory and its children
are automatically removed.
-- each directory contains an eventfs-managed "head" symlink that points to
the newest-created regular file child
-- each directory contains an eventfs-managed "tail" symlink that points to
the oldest-created regular file child
-- unlink()-ing "head" really unlinks the file that "head" points to, and
causes "head" to point to the next-newest regular file child
-- unlink()-ing "tail" really unlinks the file that "tail" points to, and
causes "tail" to point to the next-oldest regular file

* We will mount eventfs on /dev/events/.

* libudev-compat creates the directory /dev/events/libudev-$PID/ when the
client starts, effectively giving it an in-order queue of device events.
Because directories share fate with the processes that create them in
/dev/events, the client doesn't need to remove /dev/events/libudev-$PID/
when it exits (i.e. you can kill -9 it without polluting your filesystem
tree).

* create a tool called "event-put" that writes a device uevent packet to
/dev/events/$PID_OF_DEVICE_MANAGER/$SEQNUM, hard-links it to each
/dev/events/libudev-*/$SEQNUM, and then unlinks
/dev/events/$PID_OF_DEVICE_MANAGER/$SEQNUM. This implements a zero-copy
uevent multicast--the packet is written once (as a file), but each
libudev-compat client receives a reference to it (as a hard-link).

* I'll implement "struct udev_monitor" as follows:
-- the pollable file descriptor will be an inotify watch handle on
/dev/events/libudev-$PID/, and it watches for IN_CREATE. As a consequence,
the file descriptor will have data available whenever event-put adds a new
uevent to be processed. This means clients can continue to poll on this
file descriptor as before, and call udev_monitor_receive_device() whenever
poll() indicates that there's new data to be read.

* I'll implement "udev_monitor_receive_device" as follows:
-- read the packet at /dev/events/libudev-$PID/head
-- unlink /dev/events/libudev-$PID/head (this causes head to automatically
point to the next-newest file in /dev/events/libudev-$PID/, which happens
to be the next device event written by event-put. It also causes the event
packet's to be reclaimed automatically, since eventually its link count
will reach zero).
-- parse it into a "struct udev_device"
-- [UGLY, BUT NECESSARY FOR COMPATIBILITY] send the "struct udev_device"
through a socketpair to ourselves, to which we attach the device BPF. This
will let us use the filtering logic from libudev as-is--we will be
bug-for-bug compatible with it.
-- if the socketpair gives the device back, return it.
-- consume one inotify packet from the monitor's inotify handle per device
sent. If the inotify queue in the kernel underflows (which can happen if
the libudev-compat client gets back-logged and can't process device packets
as fast as event-put adds them), AND if there are more device packets in
/dev/events/libudev-$PID/, then synthesize sufficiently many inotify
packets so that the client program will be able to poll on the handle and
be informed that there are more devices. In effect, we use inotify handle
in "struct udev_monitor" not to get the order of files added or even their
names, but only as a pollable level-indicator that lets us know whether or
not there are more devices to receive (I'm also assuming that there aren't
any libudev clients out there that try to read the packet from the handle
and parse it directly--I'm pretty sure this isn't supported by the udev
maintainers).

Of course, eventfs would be a separate project, since it's a generic tool
that is useful in its own right. You wouldn't have to use it if you don't
want to, but it's highly recommended, since it ensures that
/dev/events/libudev-$PID/ disappears automatically when process $PID dies
and it reduces the time complexity for finding the next event from O(n^2)
to O(n) (it's O(n^2) without eventfs or some other service to maintain the
"next" symlink, since libudev-compat would have to scan the directory each
time to find the newest event).

These changes would break the private libudev API, since the private API
expects a netlink socket to handle transport, ordering, and filtering. But
AFAIK, the only consumer of this API is udevd itself, so breaking this is
fine (you'd install libudev with udevd, and libudev-compat without udevd).

Thoughts and feedback to the above welcome :)
-Jude

Donate to Dyne.org