著者: Enrico Weigelt, metux IT consult 日付: To: Jude Nelson, dng@lists.dyne.org 題目: Re: [Dng] vdev update and design document
On 03.01.2015 07:27, Jude Nelson wrote:
Hi,
> I don't disagree with you, especially since namespacing will be
> necessary when the same device node in each session must refer to a
> different device. However, as I mentioned in an earlier email, solving
> the problem of per-process access control by giving each session its own
> namespace isn't always viable, particularly on OpenBSD (which has no
> containerization support beyond chroot, and chroot isn't particularly
> useful for containing processes).
IMHO, chroot should be sufficient for that case - at least w/ proper
mounting. IIRC, running services in a chroot should be pretty standard
on *BSD. Anyways, are sure, OpenBSD really has no mount namespaces ?
OTOH, we could let do vdev do the namespacing magic (eg. based on
session ident), but still following the suggested approach.
> It's also not clear to me that the
> maintenance burden would be reduced versus using ACLs, since a strategy
> for populating a given session's /dev and keeping it up-to-date with
> hotplug events would probably be comparably complex to vdev's ACL system
> (and this is on top of the container lifecycle management code you'd
> have to write).
hmm, we would need some 2-layer approach here:
* layer 1: global context - all available system devices
* layer 2: session context - only the per-session (virtual) devices
Between these layers, we'd have a mapping (probably defined by the
session manager), defining which real/system devices are mapped into
some session context.
Note: my primary goal here is not just access control (for that alone,
groups and permissions would sufficient, IMHO), but an per-session
device name virtualization, to ease userland configuration (eg. for
arbitrary users never ever having to care about proper audio device
names, etc).
In that context, we'd have separate types of sessions (or perhaps call
'em 'scopes'). For example, X servers would run on their own UIDs (one
per display) - things like vdev mappings here would be defined by the
display manager. Arbitrary users won't ever get direct access to the
underlying kernel devices.
>> I'd rather raise the question whether that's useful at all.
>
> There was an LWN article on this a while back [2]. The examples
> provided there are as follows:
> * If the login program could revoke() the tty device node before
> prompting the password, this attack vector would be removed (assuming
> the revoke() implementation didn't affect file descriptors in the
> calling process).
Sure. But shouldn't that potentially attacking process be killed in the
first place ?
Anyways, if we're talking about local tty, a user sitting on front of
the console can't even be sure that he's talking to the real login
program, if he sees some login prompt. Doing trojan attacks here is
pretty trivial (in fact, that was one of my first easy hacks, back in
school, I was using to take over our admin's account - what eventually
lead to /me becoming offical admin ;-)). To prevent that kind of
attacks, we would need an separate output channel (eg. some special
screen region, etc) which is exclusive to the real login program and
cant be touched by arbitrary user processes.
> This also applies to X11, which could revoke() the
> video device file prior to setting it up.
Same case here. Of course, we have to consider ugly side effects from
just cutting of processes from a device (still today, crashing X servers
can leave the display/tty in broken state :().
> * Suppose a process has open files in a filesystem you're trying to
> unmount. You could revoke all files in the filesystem prior to trying
> to umount() it.
Whoooh, that's _dangerous_. Yes, forced closing the fd's from kernel
side would keep the filesystem metadata consisent, but the application
might get into really weird state if suddenly some fds get lost. Unless
the application is _known_ to handle that gracefully, it should be
properly shut down (at least w/ SIGTERM and proper shutdown timeout).
So, yet another argument for _not_ simply revoking.
cu
--
Enrico Weigelt,
metux IT consulting
+49-151-27565287