Author: Jude Nelson Date: To: Enrico Weigelt, metux IT consult CC: dng@lists.dyne.org Subject: Re: [Dng] Device management [WAS: system scriptinng language.]
Hi Enrico,
I agree that vdev should represent as much of its state as possible through
its filesystem. This will include things like:
* the usual POSIX stuff for each device node (user, group, mode)
* OS-specific device parameters for each device node, as extended
attributes (e.g. which subsystem the device belongs to, the sequence number
of the netlink packet, etc.)
* the command used to generate a device node's path, as an extended
attribute (this is vdev-specific)
* the commands used to create and update each node, as extended attributes
(these are vdev-specific)
* the action(s) taken as a result of the node being created or updated, as
extended attributes (these are vdev-specific)
* maybe some usage statistics?
The reason for storing extra information as external config files (ACLs and
actions) is that sometimes vdev must know how to handle device events
*before* it can start processing requests from the OS (i.e. for correctly
setting ownership and mode bits, and for avoiding name collisions).
However, once the initial ACLs and actions are loaded, vdev should expose
them under, say, /dev/vdev/acls/ and /dev/vdev/actions/ as regular files.
Adding, editing, and removing these files would change vdev's runtime
behavior accordingly, as would directly editing any of the above
information already exposed via the filesystem. Also, I think that as long
as there's a simple policy in place that ensures that each device node is
invisible by default (until the admin changes it otherwise), and that
device name collisions get handled in a sane manner, you could get away
with not having any initial ACLs at all, and simply treat vdev as a typical
/dev filesystem that the admin sets up manually (or from a script) before
letting users access it.
> One interesting question here is whether we should do our own
> namespacing (within vdev itself), or just use the kernel infrastructure
> for that. (by the way: does anybody here know how other kernels,
> like *bsd handle namespaces ?)
I think I could offer the admin a continuous trade-off between per-session
device namespaces and doing all device namespacing in a global vdev,
whatever his/her preference. In the former case, the admin (or session
manager) would mount a new vdev on the user's /dev mountpoint, and define
for that user the set of ACLs that ensure that the user only sees the
devices (s)he can access. In the latter case, the admin would carefully
craft a set of ACLs (or script that programmed vdev during boot) that
ensured that each user saw only the devices (s)he could access. Everyone
wins.
> One interesting question here is whether we should do our own
> namespacing (within vdev itself), or just use the kernel infrastructure
> for that. (by the way: does anybody here know how other kernels,
> like *bsd handle namespaces ?) > Maybe we could go through some scenarios, where you'd currently use
> ACLs and check whether they could be done better w/ namespaces.
> (in fact, I prefer not to use ACLs, due to additional complexity)
I don't doubt that giving each user (or each session) its own /dev will
offer the most flexibility in Linux. However, it is hard to do this
consistently across operating systems. With Linux, you can give each
session its own set of namespaces via unshare(2). With FreeBSD, you could
conceivably give each session its own jail, but the jail will offer limited
networking options (i.e. no raw packets, so no ping or tcpdump or the
like). OpenBSD only offers chroot, which can be easily escaped. Since I'm
looking to port vdev to !Linux, vdev shouldn't rely on the OS's namespacing
capabilities to provide different users different views of /dev.
> One example is session isolation: here I'm pretty sure that, on login
> or session start, a proper namespace should be constructed, before
> calling the login shell is started. Do you see any reason for not
> going that way ?
I must emphasize that containers alone shouldn't be relied upon as a
security solution, since local privilege escalation attacks that could be
used to circumvent them get discovered pretty regularly. If your
motivation for doing per-session containers is only namespace isolation
(i.e. give each user a different view of the system, so programs can
continue to work as if they had the whole system to themselves), then this
approach looks sound. There will be a bit of legwork involved in giving
each container a proper network interface, however, since you'll have many
options (for example, do you want put your containers behind a NAT, or do
you want them to be able to bind to the root context's IP address?).
> By the way: does vdev's ACL handling also allow revoking permissions
> to some device even on already opened fd's ?
Not possible; you need the kernel to help you there. FreeBSD offers it,
but Linux does not (AFAIK; I know that there's been interest in adding it).
-Jude
On Wed, Dec 31, 2014 at 6:20 AM, Enrico Weigelt, metux IT consult <
enrico.weigelt@???> wrote:
> On 31.12.2014 01:56, Jude Nelson wrote:
>
> Hi,
>
> > A much more elegant solution would be to give each session its own
> > /dev like you were originally saying--it would allow users to
> > interact with different devices under the same name, while also
> > preserving POSIX filesystem semantics.
>
> Yes, I really think, separate namespaces are the correct way to do.
>
> Actually, I didn't even think about ACLs (which introduce extra
> dimensions orthogonal to the filesystem tree), but doing everything
> via separate /dev namespaces.
>
> One interesting question here is whether we should do our own
> namespacing (within vdev itself), or just use the kernel infrastructure
> for that. (by the way: does anybody here know how other kernels,
> like *bsd handle namespaces ?)
>
> Maybe we could go through some scenarios, where you'd currently use
> ACLs and check whether they could be done better w/ namespaces.
> (in fact, I prefer not to use ACLs, due to additional complexity)
>
> One example is session isolation: here I'm pretty sure that, on login
> or session start, a proper namespace should be constructed, before
> calling the login shell is started. Do you see any reason for not
> going that way ?
>
> By the way: does vdev's ACL handling also allow revoking permissions
> to some device even on already opened fd's ?
>
>
> cu
> --
> Enrico Weigelt,
> metux IT consulting
> +49-151-27565287
>