:: Re: [Dng] [dng] vdev status update
Página Principal
Delete this message
Reply to this message
Autor: Jude Nelson
Data:  
Para: Isaac Dunham
CC: dng@lists.dyne.org
Assunto: Re: [Dng] [dng] vdev status update
[Snip]

> > I don't yet know what the policy declaration and interpretation logic for
> > putting device events into clients should look like, but to a first
> > approximation, what about something like this?
> >
> > $ cat /etc/libudev-compat/chromium.conf:
> > # only allow Chromium to see webcams and microphones, i.e. for hangouts
> > binary=/usr/lib/chromium/chromium
> > order=deny,allow
> > deny=all
> > allow=SUBSYSTEM=v4l SUBSYSTEM=sound
>
> Hmmm.... I *don't* like handling this from shell.
>
> Supporting order=... deny=all means you're going to need a very long
> shell script. It would be much nicer to have a C tool that can be used
> thus:
> vdev_allowpid <pid number>
> echo $?
> (returns 1 if it should be denied, 0 if it should succeed).
>


Agreed.


>
> Remember, in shell you will be doing something with this general structure:
>
> for pid in listeners; do
>         is_permitted pid && write uevent >$PIDPATH/$FILENAME
> done

>
> and if you invoke this for every hotplug event, this means that having
> 20 listeners, 20 program invocations per loop, and 15 hotplug events
> (a real possibility for plugging in a phone) results in 6000
> processes starting.
> That means you might trivially end up fork-bombing a system via hotplug.
> grep, awk, and sed are off limits, given the potential performance issues.
>


Understood.


>
> If the design is pursued, here's what I might do for mdev, based on the
> following assumptions:
> * I assume that only SUBSYSTEM=... can be used as a filter; allowing any
> variable from the environment to be used requires some rather hideous
> indirection and calling external tools that quickly cause the system to
> grind to a halt.
>


Is there a significant difference in complexity between filtering on
SUBSYSTEM only, versus filtering on an arbitrary uevent key? I can see why
someone might want to filter by PCI bus, USB bus, or USB device, for
example (e.g. give one container a subset of a computer's hardware devices;
give another container another disjoint subset).


> * libudev-compat only needs a unique filename, and uses inotify to check
> if there's a new one. For the hotplugger, SEQNUM is a guaranteed unique
> value.
> * I can check /proc/self/exe directly to determine filename.
> * The configuration for an executeable is in
> /etc/libudev-compat/<`basename $exe`>.conf
> If there is no way to resolve an exe pathname to a single filename, it
> becomes nearly impossible.



This makes it trivial to circumvent vdev_allowpid--if all you're filtering
on is the basename, then an untrusted user can access any /dev/uevent/$PID
with any binary by putting it in their $HOME and naming it after a trusted
binary.


>


* rather than using order=deny,allow, I assume that the order in the file
> is reverse order of precedence (so putting deny= before allow= gives the
> same behavior as order=deny,allow would).
>


Agreed.


> * POSIX sh is required.
>
> * /etc/mdev.conf includes this line:
> $SEQNUM=.* root:root 660 @/lib/mdev/uevent-writer
> # or the equivalent:
> $ACTION=add root:root 660 @/lib/mdev/uevent-writer
>
> === head of /lib/mdev/uevent-writer ===
> #!/bin/sh
> allow_pid() {
>         EXE=`readlink /proc/$1/exe`
>         unset RETURN
>         [ -r /etc/libudev-compat/${EXE##*/}.conf ] || return 0

>
>         { while read LINE; do
>                 case $LINE in
>                         (deny=all) RETURN=1 ;;
>                         (allow=all) RETURN=0 ;;
>                         (allow=*SUBSYSTEM=$SUBSYSTEM*) RETURN=0 ;;
>                         (deny=*SUBSYSTEM=$SUBSYSTEM*) RETURN=1 ;;
>                 esac
>         done
>         return $RETURN
>         } </etc/libudev-compat/${EXE##*/}.conf
>         return $?
> }

>
> FILENAME=$SEQNUM
>
> for pid in /dev/uevent/*
> do
>         allow_pid ${pid##*/} && env >${pid}/$FILENAME
> done
> === tail of /lib/mdev/uevent-writer ===

>
>
>

I like the idea of using the SEQNUM better than the SHA256. It makes it
easy for a user to see the order in which events are reported.


> > I also have a library (libpstat) and tool (pstat) [2] that lets you get
> > helpful information about a process, given it's PID:
> >
> > $ pstat 27360
> > PID: 27360
> >   running: true
> >   binary:  /usr/lib/chromium/chromium
> >   deleted: false
> >   inode:   3417747
> >   size:    103832208
> >   modtime: 1425639789

> >
> > (I can make the format easier to parse--e.g. as environment variables).
> >
> > Then, when given a device uevent (which will contain the SUBSYSTEM), a
> > shell script can parse the policy files in /etc/libudev-compat/, iterate
> > through the PIDs named by /dev/uevents/$PID, and use pstat to find out
> > whether or not the policy file applies to the process's binary (and if
> so,
> > choose whether or not to write the uevent packet). Moreover, pstat uses
> > the contents of the symlink /proc/$PID/exe to figure out the binary's
> path,
> > which IIRC (but would need confirmation) cannot be altered by an
> > unprivileged program, meaning that /proc/$PID/exe is an unforgeable
> process
> > attribute. This, combined with the fact that we'd use runfs on
> > /dev/uevents to ensure that only $PID directories for running processes
> are
> > visible on each readdir() and stat(), should ensure that the only way a
> > process can get a uevent packet this way is if it is (1) running, and (2)
> > allowed to receive it according to at least one policy file in
> > /etc/libudev-compat.
>
> > Again, I'd love to hear your thoughts, especially if there is a simpler
> > approach.
>
> for pid in /dev/uevents/*; do
>         EXE=$(readlink /proc/${pid##*/}/exe)
> done

>
> is simpler.
>


Agreed, but you need to be careful when parsing /proc/$$/exe, since it can
be suffixed with " (deleted)" if the running binary was unlinked (e.g.
through a package upgrade).


>
> Also, s/at least one policy file/its policy file/, unless you plan to
> write a helper in some compiled language (preferably plain C, or C++
> without libstdc++/STL/..., since you need to keep size to a minimum).
>
> > > What about (3) having an option for runfs that lets it erase
> directories
> > > (with their subentries) on process termination, but lets regular files
> > > persist until then?
>
> > I like this idea best :) I'll add that to runfs's issue tracker as an
> > enhancement. I'm thinking of having runfs interpret a
> > "user.runfs.persistent_files=1" extended attribute to enable this on a
> > per-directory basis.
>
> I was thinking of a mount/commandline option so that it could be set for
> the whole mountpoint; remember, ACL support is optional on Linux.
>


That works too!

Thanks,
Jude