:: Re: [Dng] [dng] vdev status update
Page principale
Supprimer ce message
Répondre à ce message
Auteur: Isaac Dunham
Date:  
À: Jude Nelson
CC: dng@lists.dyne.org
Sujet: Re: [Dng] [dng] vdev status update
On Thu, Apr 09, 2015 at 11:55:46AM -0400, Jude Nelson wrote:
> Hi Isaac,
>
> On Wed, Apr 8, 2015 at 7:04 PM, Isaac Dunham <ibid.ag@???> wrote:
>
> > On Tue, Apr 07, 2015 at 05:22:55PM -0400, Jude Nelson wrote:
> > > > > > To avoid the troublesome corner case where a libudev client
> > crashes and
> > > > > >> potentially leaves behind a directory in /dev/uevents/, I would
> > > > recommend
> > > > > >> mounting runfs [1] on /dev/uevents. Runfs is a special FUSE
> > > > filesystem I
> > > > > >> wrote a while back that ensures that the files created in it by a
> > > > > >> particular process get automatically unlinked when that process
> > dies
> > > > (it
> > > > > >> was originally meant for holding PID files).
> > > > Hmm...
> > > > Do we need to have a subdirectory of the mountpoint?
> > > > Could you just use ACLs if you need to make a limited subset available?
> > > > I get the impression that we can do this for mdev via a script along
> > > > these lines:
> > >
> > >
> > > > FILENAME=`env | sha512sum | cut -d' ' -f1`
> > > > for f in /dev/uevents/*
> > > >         do env >"$f"/$FILENAME
> > > > done

> > > >
> > > > but it would be *nicer* if we only needed to write one file.
> > > >
> > >
> > > I agree that one file per event is ideal (or even a circular logfile of
> > > events, if we could guarantee only one writer). However, I'm not sure
> > yet
> > > what a fine-grained ACL for device events would look like. My motivation
> > > for per-client directories is that unprivileged clients can be made to
> > see
> > > only its own events and no one else's by default (i.e. by chmod'ing the
> > > directory to 0700), and that they make it easy reason about sending
> > > post-processed events only to the clients you want--just change the list
> > of
> > > directories to iterate over in that for-loop :)
> >
> > Which is not trivial in shell, unless you have a special command to do
> > the work of figuring out which which directories get what.
> > ...which seems to make doing this in shell pointless, since the
> > corresponding C is nearly as trivial.
> >
> >
> I don't yet know what the policy declaration and interpretation logic for
> putting device events into clients should look like, but to a first
> approximation, what about something like this?
>
> $ cat /etc/libudev-compat/chromium.conf:
> # only allow Chromium to see webcams and microphones, i.e. for hangouts
> binary=/usr/lib/chromium/chromium
> order=deny,allow
> deny=all
> allow=SUBSYSTEM=v4l SUBSYSTEM=sound


Hmmm.... I *don't* like handling this from shell.

Supporting order=... deny=all means you're going to need a very long
shell script. It would be much nicer to have a C tool that can be used thus:
vdev_allowpid <pid number>
echo $?
(returns 1 if it should be denied, 0 if it should succeed).

Remember, in shell you will be doing something with this general structure:

for pid in listeners; do
    is_permitted pid && write uevent >$PIDPATH/$FILENAME
done


and if you invoke this for every hotplug event, this means that having
20 listeners, 20 program invocations per loop, and 15 hotplug events
(a real possibility for plugging in a phone) results in 6000
processes starting.
That means you might trivially end up fork-bombing a system via hotplug.
grep, awk, and sed are off limits, given the potential performance issues.

If the design is pursued, here's what I might do for mdev, based on the
following assumptions:
* I assume that only SUBSYSTEM=... can be used as a filter; allowing any
variable from the environment to be used requires some rather hideous
indirection and calling external tools that quickly cause the system to
grind to a halt.
* libudev-compat only needs a unique filename, and uses inotify to check
if there's a new one. For the hotplugger, SEQNUM is a guaranteed unique
value.
* I can check /proc/self/exe directly to determine filename.
* The configuration for an executeable is in
/etc/libudev-compat/<`basename $exe`>.conf
If there is no way to resolve an exe pathname to a single filename, it
becomes nearly impossible.
* rather than using order=deny,allow, I assume that the order in the file
is reverse order of precedence (so putting deny= before allow= gives the
same behavior as order=deny,allow would).
* POSIX sh is required.

* /etc/mdev.conf includes this line:
$SEQNUM=.* root:root 660 @/lib/mdev/uevent-writer
# or the equivalent:
$ACTION=add root:root 660 @/lib/mdev/uevent-writer

=== head of /lib/mdev/uevent-writer ===
#!/bin/sh
allow_pid() {
    EXE=`readlink /proc/$1/exe`
    unset RETURN
    [ -r /etc/libudev-compat/${EXE##*/}.conf ] || return 0


    { while read LINE; do
        case $LINE in
            (deny=all) RETURN=1 ;;
            (allow=all) RETURN=0 ;;
            (allow=*SUBSYSTEM=$SUBSYSTEM*) RETURN=0 ;;
            (deny=*SUBSYSTEM=$SUBSYSTEM*) RETURN=1 ;;
        esac
    done
    return $RETURN
    } </etc/libudev-compat/${EXE##*/}.conf
    return $?
}


FILENAME=$SEQNUM

for pid in /dev/uevent/*
do
    allow_pid ${pid##*/} && env >${pid}/$FILENAME
done
=== tail of /lib/mdev/uevent-writer ===




> I also have a library (libpstat) and tool (pstat) [2] that lets you get
> helpful information about a process, given it's PID:
>
> $ pstat 27360
> PID: 27360
>   running: true
>   binary:  /usr/lib/chromium/chromium
>   deleted: false
>   inode:   3417747
>   size:    103832208
>   modtime: 1425639789

>
> (I can make the format easier to parse--e.g. as environment variables).
>
> Then, when given a device uevent (which will contain the SUBSYSTEM), a
> shell script can parse the policy files in /etc/libudev-compat/, iterate
> through the PIDs named by /dev/uevents/$PID, and use pstat to find out
> whether or not the policy file applies to the process's binary (and if so,
> choose whether or not to write the uevent packet). Moreover, pstat uses
> the contents of the symlink /proc/$PID/exe to figure out the binary's path,
> which IIRC (but would need confirmation) cannot be altered by an
> unprivileged program, meaning that /proc/$PID/exe is an unforgeable process
> attribute. This, combined with the fact that we'd use runfs on
> /dev/uevents to ensure that only $PID directories for running processes are
> visible on each readdir() and stat(), should ensure that the only way a
> process can get a uevent packet this way is if it is (1) running, and (2)
> allowed to receive it according to at least one policy file in
> /etc/libudev-compat.


> Again, I'd love to hear your thoughts, especially if there is a simpler
> approach.


for pid in /dev/uevents/*; do
    EXE=$(readlink /proc/${pid##*/}/exe)
done


is simpler.

Also, s/at least one policy file/its policy file/, unless you plan to
write a helper in some compiled language (preferably plain C, or C++
without libstdc++/STL/..., since you need to keep size to a minimum).

> > What about (3) having an option for runfs that lets it erase directories
> > (with their subentries) on process termination, but lets regular files
> > persist until then?


> I like this idea best :) I'll add that to runfs's issue tracker as an
> enhancement. I'm thinking of having runfs interpret a
> "user.runfs.persistent_files=1" extended attribute to enable this on a
> per-directory basis.


I was thinking of a mount/commandline option so that it could be set for
the whole mountpoint; remember, ACL support is optional on Linux.

> Thanks again for your feedback,
> Jude



Thanks,
Isaac Dunham