Re: NetBSD Jails

Hi,
Didn't think my topic would make so many waves!
I won't debate on which is better between VM or containers. Both have their strength and weaknesses and in my opinion, usually your needs and skills makes one or the other the right choice for you. Nevertheless, I wanted (and still want) for a jail like feature in NetBSD. As I previously said VMs are great, however, they are mostly targeted to some architectures ( amd64 and maybe aarch64?). I chose NetBSD because it can run on most "exotic" platforms ( Isn't its motto "Of course it runs NetBSD"). So it's my choice for my sparc32 and PowerPC platforms but not yet for my sparc64. Why? Because I really like and use Solaris zones and switching to an OS that does not support this feature would be a downgrade.

As someone said, I guess I can build something custom around chroots or use something already created lying around ( like sailor) yeah... but the first point is, in a user perspective, it would be a serious time-saver to have standard instead of having each one our own way of dealing with chroot (ask lxc/jails/zones users) and the second point, NetBSD is known for its great portability. Its virtualization ( or containers) should respect that goal and I think a jail like feature being hardware agnostic would totally fit with this goal.

I think the most of the NetBSD community would want/like this feature. The questions are : "Does the NetBSD Project/Kernel team want it ? If yes, what does they need to do it?"

On Sun, May 17, 2020 at 12:01 AM Greg A. Woods <woods%planix.com@localhost> wrote:

At Sat, 16 May 2020 22:52:24 -0400, "Aaron B." <aaron%zadzmo.org@localhost> wrote:
Subject: Re: NetBSD Jails
>
> It also doesn't solve the ultimate issue here, which is isolation: a
> user (in the kernel sense of user, not necessary a human logged in via
> SSH) in one chroot could run 'ls' or equivalant syscalls and see
> activity inside a different chroot.

Hmmm... is this a real threat model? Or just a "nice to have"?

(and maybe not "ls" -- else your chroot is leaking, but even if I can
run "ps" and see all the processes on the system, is that a problem?)

I know some people do allow human users to login to FreeBSD "jails", but
I really have to wonder why. I think if you want to give human users
the idea that they have their own machine then you really do need to
give them a whole VM (at least with Unix/POSIX systems -- modernized
multics-like systems might be a better way).

However with just some process(es) running a service in a chroot, isn't
this only a threat when you assume the process(es) _will_ _always_ be
compromised with shell-equivalent or remote-code exploits? Otherwise
they're running vetted code that can only do as it is told! Yes there's
always an exploit risk, but what's the treat model where the highest
risk of such an exploit is that one exploited instance can see what
processes another un-exploited instance is running? Maybe they can find
other like jails which can also be exploited, but still, what's the
threat model? More web pages to deface? Really?

> To solve the problem this way, I have to rebuild the chroot with a
> custom nginx configuration for every place I wish to deploy it - or
> manually manipulate it's configuration after deployment.
>
> This defeats the entire point of a container system: I can build the
> desired nginx environment once, and then deploy it wherever it is
> needed with no modifications. Being forced to customize it for 30+
> locations, however that customization is done, doesn't scale very well
> with the human as the bottleneck.

I guess this is one thing I just don't understand about the "modern" way
of bundling unrelated but like services onto a single OS instance.

I really would expect to have to configure each individual instance in
at least some minor way.

After all you have to do the per-instance configuration _somewhere_.

Does it really matter whether the configuration is done in the
"container"/"jail" layer, or directly to the in the per-instance config
files?

Personally I'd do it in the per-VM /etc filesystem which all sit
adjacent to each other on the file server, rather than in each
chroot/jail directory. You can then edit them all in parallel with one
database driven configuration management tool.

> The way I see it: containers shouldn't be thought of like virtual
> machines, but more like installed packages.

If only they were that simple (i.e. that "containers" are just boxed up
replicate-able add-on packates). Typically they end up being many
orders of magnitude more complex, especially in the hidden under layers,

Actual real VMs are (when well designed), will have some (virtual)
layers, but they're all transparent w.r.t. whether there's bare hardware
underneath, or any number of VMs running together; and with fewer
differences, fewer complexities, and fewer parts to break or hide bugs.
It's POSIX all the way down (and across), not a different way of doing
things in each different kind of "container" environment.

> It's fine grained in ways I don't care about, and also doesn't control
> things I do care about.
>
> I don't care how much core memory is locked, that you have a valid home
> directory, how big your stack size is - I'd like to say "you get 1
> gigabyte, use it as you want." It makes perfect sense when you have 100
> humans sharing a computer. It makes a lot less sense when you have 100
> applications sharing a computer that only one human has a shell account
> for.

Regarding existing kernel resource controls (rlimits) being unsuitable:

So, OK, what resource controls do you _really_ need?

I can imagine that bandwidth limits (and accounting) would be
interesting. I think, though I have never done it, that limits can be
achieved, albeit with some complexity, using altq and bridges. I
studied ideas for implementing per-user/route/address/etc. bandwidth
accounting some time ago but didn't come up with any great solutions.

I/O rate limiting on the filesystem, well that's a whole different
kettle of fish. I don't even know if it makes sense once the main
bottleneck of network bandwidth is under control.

> Since CPU cycles are cheap, there's also this: the full VM route brings
> quite a bit of human overhead as well. Now I (usually) have to maintain
> a copy of the entire operating system for every application I wish to
> run. I need to centralize authentication services and logging. Tools
> like Ansible come into play to ensure VM's don't drift away from a
> common specification. Different OS versions cause random
> incompatibilites - once you're past a ten or twenty VM's, it's
> very unlikely you can upgrade all of them at once.

I would argue full VMs are actually easier to manage, especially when
they all share the same installed software and only have minor
configuration differences (and all those config files also live adjacent
to each other on a big file server).

I only ever upgrade one root filesystem. I only ever install packages
once, in one place. All like-minded VMs share these things -- one copy,
read-only.

I only ever run one program that edits all the similar configurations at
once (with specific details taken into account automatically by a data
driven config tool for each instance of course).

> Anyway, none of computing on this scale is simple.

It should be. It can be. It probably is the only sane/safe/efficient
way. Complexity breeds disaster.

> The complexity will
> exist somewhere, it's a question of where.

I don't agree -- I think it's just a matter of seeing requirements in a
different light so that the complexity can be cut away for good.

--
Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>