:: Re: [Dng] [dng] vdev status update
Startseite
Nachricht löschen
Nachricht beantworten
Autor: Jude Nelson
Datum:  
To: Isaac Dunham
CC: dng@lists.dyne.org
Betreff: Re: [Dng] [dng] vdev status update
Hi Isaac,

[Snip]

> I should clarify one detail here: when you run a hotplugger, rather
> than reading a uevent file or a netlink message, the kernel runs the
> hotplugger with an environment equivalent to the uevent. Thus,
> "variable from the environment" may be treated as interchangeable with
> "uevent key".
>
>

Yes, this is important for me to keep in mind. I had been assuming that
vdevd would either be running as a daemon or be manually invoked now and
then to (re)generate device nodes in a batch (i.e. with the --once option).

Assuming that I have to implement it in shell (which is the point of the
> example), yes, there's a *very* significant difference between filtering
> on a fixed key and filtering on an arbitrary key.



> And that's part of the point: I can see why someone might filter by an
> arbitrary key, but it's high-overhead for shell.
>


Gotcha.


> > > * libudev-compat only needs a unique filename, and uses inotify to
> check
> > > if there's a new one. For the hotplugger, SEQNUM is a guaranteed
> unique
> > > value.
> > > * I can check /proc/self/exe directly to determine filename.
> > > * The configuration for an executeable is in
> > > /etc/libudev-compat/<`basename $exe`>.conf
> > > If there is no way to resolve an exe pathname to a single filename, it
> > > becomes nearly impossible.
> >
> >
> > This makes it trivial to circumvent vdev_allowpid--if all you're
> filtering
> > on is the basename, then an untrusted user can access any
> /dev/uevent/$PID
> > with any binary by putting it in their $HOME and naming it after a
> trusted
> > binary.
>
> Again, this is a limitation of an efficient shell implementation
> .
> Also, the default behavior of udev is "allow all processes to access
> anything", so there's no way to make a dropin replacement that's
> completely interchangeable yet reliably restricts permissions.
>


Hmmmm, I wonder if it might be worth our while to just make a program
called "uevent_log" that takes the uevent packet from stdin, reads
/etc/libudev-compat/*.conf, and writes the packet selectively to each of
the /dev/uevent/$PID directories (using libpstat or the like to figure out
which clients may receive the event). While we're at it, maybe it makes
even more sense from a space and I/O bandwidth perspective to log the
uevent to /dev/uevent/log/ (owned by root.root with mode 0700) if at least
one client needs it, hard-link it into each /dev/uevent/$PID/ directory for
each client that is allowed to read it, and then unlink it from
/dev/uevent/log (so we only ever write the packet once, but it shows up and
stays around as long as at least one running client hasn't consumed it).

I'd also recommend passing SUBSYSTEM as a command-line option to the helper
program to make it easier to test and easier to port to other device
managers (i.e. vdevd prefixes all of its helpers' environment variables
with "VDEV_" to avoid variable collisions, but this is a vdevd-specific
behavior that I'd not want to incorporate into the helper).


> > >
> > > * /etc/mdev.conf includes this line:
> > > $SEQNUM=.* root:root 660 @/lib/mdev/uevent-writer
> > > # or the equivalent:
> > > $ACTION=add root:root 660 @/lib/mdev/uevent-writer
> > >
> > > === head of /lib/mdev/uevent-writer ===
> > > #!/bin/sh
> > > allow_pid() {
> > >         EXE=`readlink /proc/$1/exe`
> > >         unset RETURN
> > >         [ -r /etc/libudev-compat/${EXE##*/}.conf ] || return 0

> > >
> > >         { while read LINE; do
> > >                 case $LINE in
> > >                         (deny=all) RETURN=1 ;;
> > >                         (allow=all) RETURN=0 ;;
> > >                         (allow=*SUBSYSTEM=$SUBSYSTEM*) RETURN=0 ;;
> > >                         (deny=*SUBSYSTEM=$SUBSYSTEM*) RETURN=1 ;;
> > >                 esac
> > >         done
> > >         return $RETURN
> > >         } </etc/libudev-compat/${EXE##*/}.conf
> > >         return $?
> > > }

> > >
> > > FILENAME=$SEQNUM
> > >
> > > for pid in /dev/uevent/*
> > > do
> > >         allow_pid ${pid##*/} && env >${pid}/$FILENAME
> > > done
> > > === tail of /lib/mdev/uevent-writer ===

>
> > I like the idea of using the SEQNUM better than the SHA256. It makes it
> > easy for a user to see the order in which events are reported.
>
> I think SEQNUM is only available for hotpluggers.
>


I think it's also sent via netlink, so vdevd would see it on hotplug (but
not coldplug).


>
> [snip]
> > > > attribute. This, combined with the fact that we'd use runfs on
> > > > /dev/uevents to ensure that only $PID directories for running
> processes
> > > are
> > > > visible on each readdir() and stat(), should ensure that the only
> way a
> > > > process can get a uevent packet this way is if it is (1) running,
> and (2)
> > > > allowed to receive it according to at least one policy file in
> > > > /etc/libudev-compat.
>
> ...I see a problem with this approach:
> libudev-compat should be a dropin replacement by default.
> You can't achieve that and require a new configuration file, unless you
> provide a default file containing something like this:
>
> exe=*
> allow=all
>
> and then search for narrower matches.
>


Yes, that would have to be the default behavior to avoid out-of-the-box
breakage, regardless of what it means for config files. I'm okay with
that--device event filtering is a somewhat specialized need, but it's
definitely one I'd use at $DAYJOB or in any scenario where I'm running
libudev programs in containers that shouldn't see all of the hardware
events (it would go along with e.g. blocking netlink socket access and
filtering /dev in the container context--such as with vdevfs).


>
> > > > Again, I'd love to hear your thoughts, especially if there is a
> simpler
> > > > approach.
> > >
> > > for pid in /dev/uevents/*; do
> > >         EXE=$(readlink /proc/${pid##*/}/exe)
> > > done

> > >
> > > is simpler.
> > >
> >
> > Agreed, but you need to be careful when parsing /proc/$$/exe, since it
> can
> > be suffixed with " (deleted)" if the running binary was unlinked (e.g.
> > through a package upgrade).
>
> Then add
> EXE=${EXE% (deleted)}
> after setting EXE.
>
> It's two lines of shell, or 4-10 lines of C depending how fancy you get;
> I don't think a library or tool is called for just to get the name of the
> executeable.
> Thanks for the warning, though.
>


Right :) Sorry if I was coming off as overly-pedantic in my earlier email;
I was just pointing out a /proc-ism I had discovered the hard way :)

Your input is helpful as always :)
-Jude