:: [Dng] [dng] vdev status update
Forside
Slet denne besked
Besvar denne besked
Skribent: Jude Nelson
Dato:  
Til: dng@lists.dyne.org
Emne: [Dng] [dng] vdev status update
Hey everyone,

Scooby, John Carline, lepoitr, and others (who wish to remain anonymous)
have been sending me logs filesystem listings from running vdev locally. I
very much appreciate it--it helped me discover and fix bugs relating to
persistent paths for disk devices, seeding /dev with initial device files,
and adding support for Virtualbox USB devices. Thank you to everyone who
helped me with this!

We are getting close to feature-parity with udev insofar as generating the
proper device paths. However, there are still a few kinds of device paths
that are not yet handled (that I know of). They are:

* /dev/input/by-id/*
* /dev/disk/by-partuuid/*
* /dev/md/*
* LVM volume group and logical volume paths (i.e.
/dev/$VOLUME_GROUP/$LOGICAL_VOLUME)

Lack of LVM VG/LV paths is probably a deal-breaker for many people on this
list; I'll make it my next priority.

I've also been modifying libudev 219 so clients do not need udevd to be
running to receive device events. I mentioned last week that the strategy
I would take is watching /dev for device files to get added/removed, and
synthesizing the appropriate driver core uevent from the added/removed file
(i.e. by looking up the associated uevent in /sys).

However, this has proven to be a somewhat challenging problem, for a couple
reasons. First, inotify(2) does not work on pseudo-filesystems like sysfs
or devtmpfs, so we can't just watch /sys for changes, and we can't use
devtmpfs for /dev. This means that systems using libudev-compat will
require a "vanilla" tmpfs for /dev (or nothing at all), so it can detect
when devices are added or removed. Second, only block and character
devices show up in /dev. However, udevd reports every kind of device,
since it listens to the kernel's driver core (i.e. libudev learns about
network interfaces, buses, power supplies, etc.--stuff for which there are
no device files). Clients will expect this behavior, so it's not enough to
simply look up a new block/character device's sysfs data.

My tentative solution is to require the device manager (whatever it happens
to be) to take one extra step in addition to adding/removing device files:
record driver core uevents in a well-defined location in /dev (let's say
/dev/uevents/), so libudev clients can discover them (with inotify(2)),
read them, and send them off to their applications. This can be done
without loss of generality in udev, vdev, and mdev, and I can make a script
that takes the appropriate action with mknod (so those with a static /dev
can alias "mknod" to the script, if desired).

The device manager would treat /dev/uevents/ as an "IPC" area. A libudev
client would create a directory /dev/uevents/$PID/ upon initialization, and
the device manager would write the uevent to each directory in
/dev/uevents/ to a file named by the hash of the uevent's contents. Once
the libudev client consumed the file, it would unlink it.

For example, the PCI slot 0000:ff:02.2 on my laptop generates the uevent
packet containing:

"""
PCI_CLASS=60000
PCI_ID=8086:2D13
PCI_SUBSYS_ID=17AA:2196
PCI_SLOT_NAME=0000:ff:02.3
"""

The device manager would take the SHA256 of this string
(57d39e74f7638f6c78c1fb86d81d2f203852b609f501df216abe4b45339d636f), and
write the contents of this packet to the file
/dev/uevents/*/57d39e74f7638f6c78c1fb86d81d2f203852b609f501df216abe4b45339d636f
(i.e. one copy in each libudev client's /dev/uevents/ directory). Then,
each libudev client would get woken up through inotify(2), would read the
new file, forward the device event packet to the client, and unlink it.

To avoid the troublesome corner case where a libudev client crashes and
potentially leaves behind a directory in /dev/uevents/, I would recommend
mounting runfs [1] on /dev/uevents. Runfs is a special FUSE filesystem I
wrote a while back that ensures that the files created in it by a
particular process get automatically unlinked when that process dies (it
was originally meant for holding PID files).

Any feedback on the above development plan is welcome, especially if a
simpler, more robust approach can be found.

Thanks,
Jude

[1] https://github.com/jcnelson/runfs