:: Re: [Dng] [dng] vdev status update
トップ ページ
このメッセージを削除
このメッセージに返信
著者: Jude Nelson
日付:  
To: Isaac Dunham
CC: dng@lists.dyne.org
題目: Re: [Dng] [dng] vdev status update
Hi Isaac,

On Wed, Apr 8, 2015 at 7:04 PM, Isaac Dunham <ibid.ag@???> wrote:

> On Tue, Apr 07, 2015 at 05:22:55PM -0400, Jude Nelson wrote:
> > > > > "report every kind of device, since it listens to the kernel's
> driver
> > > core
> > > > > (i.e. libudev learns about network interfaces, buses, power
> supplies,
> > > > > etc.--stuff for which there are no device files"
> > >
> > > Currently, it doesn't *report* devices; that takes something longer
> term,
> > > like inotify, polling a netlink socket, or listening to a daemon.
> > >
> > > It also has no clue about events or hardware that could not have a
> > > corresponding device, since it uses block/char and major:minor to find
> > > the hardware.
> > >
> > > I have a general idea of how to get information like this, by recursing
> > > through /sys or /dev, and I know of some code I could use as a starting
> > > point, but I don't know what the ideal format is.
> > > If someone points me at a program they'd like to use without libudev
> > > (preferably C with minimal dependencies) that doesn't cover a lot of
> > > ground (ie, it's clear what functionality udev provides, and I wouldn't
> > > need to duplicate much of libudev to get it working), that would be a
> > > good starting point for expanding libsysdev.
> > >
> >
> > You might find something useful in vdev_linux_sysfs_register_devices()
> and
> > vdev_linux_sysfs_find_devices() functions in vdevd/os/linux.c. They're
> > both involved in generating the initial coldplug device listing. They
> only
> > need libc to work, and libvdev/sglib.h for basic data structures.
>
> I know how to get the devices that show up in /dev;
> I'm not sure about getting the sysfs entries that *don't* show up there.
> I'm also not sure how anything beyond this is used.
>


Ah, my bad for misinterpreting your request.

I found the sysfs rules overview from kernel.org helpful when I was working
on this [1]. Basically, all devices (even ones without a major/minor
number) are enumerated under /sys/devices/; the other directories under
/sys/ all contain symlinks to device directories in /sys/devices/. Each
device in /sys/devices/ has a "uevent" file that, when read, produces the
payload of the netlink packet that the driver core would have sent when the
device was added (note that some of them will be empty).

The kernel organizes devices into a tree internally, which gets exported
via sysfs. Each device has a globally-unique DEVPATH, which is the path to
the device's directory under /sys/devices (except, DEVPATH omits the /sys
prefix). Moreover, each device has a SUBSYSTEM that identifies the
device's parent node in the device tree (which may or may not be a device
itself). For example, PCI slot 0000:ff:02.2 on my laptop has a DEVPATH
of /devices/pci0000:ff/0000:ff:02.2. If you look in
/sys/devices/pci0000:ff/0000:ff:02.2/, you'll see a symlink called
"subsystem" that contains a symlink to the device's subsystem's device--in
this case, "../../../bus/pci" (the basename of the path in "subystem" is
the device's subsystem name--in this case, "pci").

The contents of the DEVPATH directory in sysfs include device-specific
attributes--usually stuff like serial numbers, vendor strings,
power-related data, etc.

Is this the information you were looking for?

[1] https://www.kernel.org/doc/Documentation/sysfs-rules.txt


> > > > > To avoid the troublesome corner case where a libudev client
> crashes and
> > > > >> potentially leaves behind a directory in /dev/uevents/, I would
> > > recommend
> > > > >> mounting runfs [1] on /dev/uevents. Runfs is a special FUSE
> > > filesystem I
> > > > >> wrote a while back that ensures that the files created in it by a
> > > > >> particular process get automatically unlinked when that process
> dies
> > > (it
> > > > >> was originally meant for holding PID files).
> > > Hmm...
> > > Do we need to have a subdirectory of the mountpoint?
> > > Could you just use ACLs if you need to make a limited subset available?
> > > I get the impression that we can do this for mdev via a script along
> > > these lines:
> >
> >
> > > FILENAME=`env | sha512sum | cut -d' ' -f1`
> > > for f in /dev/uevents/*
> > >         do env >"$f"/$FILENAME
> > > done

> > >
> > > but it would be *nicer* if we only needed to write one file.
> > >
> >
> > I agree that one file per event is ideal (or even a circular logfile of
> > events, if we could guarantee only one writer). However, I'm not sure
> yet
> > what a fine-grained ACL for device events would look like. My motivation
> > for per-client directories is that unprivileged clients can be made to
> see
> > only its own events and no one else's by default (i.e. by chmod'ing the
> > directory to 0700), and that they make it easy reason about sending
> > post-processed events only to the clients you want--just change the list
> of
> > directories to iterate over in that for-loop :)
>
> Which is not trivial in shell, unless you have a special command to do
> the work of figuring out which which directories get what.
> ...which seems to make doing this in shell pointless, since the
> corresponding C is nearly as trivial.
>
>

I don't yet know what the policy declaration and interpretation logic for
putting device events into clients should look like, but to a first
approximation, what about something like this?

$ cat /etc/libudev-compat/chromium.conf:
# only allow Chromium to see webcams and microphones, i.e. for hangouts
binary=/usr/lib/chromium/chromium
order=deny,allow
deny=all
allow=SUBSYSTEM=v4l SUBSYSTEM=sound

I also have a library (libpstat) and tool (pstat) [2] that lets you get
helpful information about a process, given it's PID:

$ pstat 27360
PID: 27360
  running: true
  binary:  /usr/lib/chromium/chromium
  deleted: false
  inode:   3417747
  size:    103832208
  modtime: 1425639789


(I can make the format easier to parse--e.g. as environment variables).

Then, when given a device uevent (which will contain the SUBSYSTEM), a
shell script can parse the policy files in /etc/libudev-compat/, iterate
through the PIDs named by /dev/uevents/$PID, and use pstat to find out
whether or not the policy file applies to the process's binary (and if so,
choose whether or not to write the uevent packet). Moreover, pstat uses
the contents of the symlink /proc/$PID/exe to figure out the binary's path,
which IIRC (but would need confirmation) cannot be altered by an
unprivileged program, meaning that /proc/$PID/exe is an unforgeable process
attribute. This, combined with the fact that we'd use runfs on
/dev/uevents to ensure that only $PID directories for running processes are
visible on each readdir() and stat(), should ensure that the only way a
process can get a uevent packet this way is if it is (1) running, and (2)
allowed to receive it according to at least one policy file in
/etc/libudev-compat.

Again, I'd love to hear your thoughts, especially if there is a simpler
approach.

[2] https://github.com/jcnelson/libpstat


> > > Also, wouldn't mounting that with runfs result in records of uevents
> > > getting erased if they're written by a helper rather than a daemon?
> > >
> >
> > Yes; good catch. There are a couple straightforward ways to address
> this:
> > (1) have a separate, unprivileged device-event-log daemon curate
> > /dev/uevents/ and have the helper scripts forward device events to it, or
> > (2) fork and/or patch runfs to allow files to persist if they're
> generated
> > by a certain whitelist of programs (i.e. all programs in a particular set
> > of directories, like /lib/vdev/), but disappear otherwise once the
> creating
> > process dies.
>
> What about (3) having an option for runfs that lets it erase directories
> (with their subentries) on process termination, but lets regular files
> persist until then?
>


I like this idea best :) I'll add that to runfs's issue tracker as an
enhancement. I'm thinking of having runfs interpret a
"user.runfs.persistent_files=1" extended attribute to enable this on a
per-directory basis.

Thanks again for your feedback,
Jude