Re: NetBSD Jails

To: NetBSD Users's Discussion List <netbsd-users%netbsd.org@localhost>
Subject: Re: NetBSD Jails
From: "Aaron B." <aaron%zadzmo.org@localhost>
Date: Sat, 16 May 2020 22:52:24 -0400

On Sat, 16 May 2020 10:57:55 -0700
"Greg A. Woods" <woods%planix.com@localhost> wrote:

> Perhaps all that's required is a tool which extracts the minimum
> required entries from the real /etc/master.passwd for each chroot?
> (and some way to maintain chroot copies?)
> 
> (Another way would be a new service behind nsdispatch(3) which would
> provide access through the chroot, e.g. via a socket, to the shared
> /etc/master.passwd, though that would assume all chrooted programs use
> only the "standard" interfaces supported by nsdispatch(3).)
> 

I've thought about keeping a SQLite database outside the chroot's that
track which deployed instance is using which UID', and rewrite
master.passwd within each at deployment.

...or, I could simply not, keep tracking the UID's in a spreadsheet
like I do now, and not have to deal with that maintaining that code
should Jails ever be implemented. I've too many other things asking for
my time to worry about implementing a feature that could be temporary.

It also doesn't solve the ultimate issue here, which is isolation: a
user (in the kernel sense of user, not necessary a human logged in via
SSH) in one chroot could run 'ls' or equivalant syscalls and see
activity inside a different chroot.

> > - All chroots share the same network stack. If I tell nginx to bind to
> > '0.0.0.0' or '::', the first instance will startup fine, the others
> > will fail with "address already in use."
> 
> Well if you're chrooting multiple instances of the same service, isn't
> it obvious that each has to listen on one and only one specific address?
> If I understand correctly one could also route a subnet via a bridge
> interface to each chrooted service.  Maybe a chrooted process should
> also be prevented from listening to a wildcard address?
> 

To solve the problem this way, I have to rebuild the chroot with a
custom nginx configuration for every place I wish to deploy it - or
manually manipulate it's configuration after deployment.

This defeats the entire point of a container system: I can build the
desired nginx environment once, and then deploy it wherever it is
needed with no modifications. Being forced to customize it for 30+
locations, however that customization is done, doesn't scale very well
with the human as the bottleneck.

> I've heard FreeBSD folks go on for days about how FreeBSD's "jails" make
> network management simpler, but I still don't have any real
> understanding of exactly what this means. 

It's a completely different mindset that takes some long held
assumptions and turns them upside down.

The way I see it: containers shouldn't be thought of like virtual
machines, but more like installed packages. Specifically, packages that
have state which is completely disconnected from the state of the base
operating system. Long story short, I came to see large VMware
deployments running nothing but hundreds of near-identical Linux VM's
as papering over the fact package managers don't keep track of
application state.

But this is all a digression. No one has to do it this way. I've found
jails are usable as virtual machines if you understand and accept the
(big) limitations.

> > The wiki's projects list has a
> > clean solution to this particular point, which may or may not be within
> > scope of jails:
> >
> > https://wiki.netbsd.org/projects/project/virtual_network_stacks/
> 
> Virtual network stacks seem to be a rather complex solution looking for
> a problem -- i.e. in actual fact more of a problem looking for trouble.
> 

I understood the task as a research project - may or may not pay
dividends in unexpected ways. It would definitely be helpful to me,
but I don't expect to see implemented it anytime soon. You asked for
what I wanted to see, I answered :)

> > - Some way to set per-chroot resource limits would be helpful. I can
> > manipulate ulimits, but that is basically driving screws with a hammer.
> > It's simply the wrong tool.
> 
> Well here's where /etc/login.conf solves most problems for normal chroot
> environments, since only ordinary users should be running inside the
> chroot.
> 
> (Or it could, if there were resource controls related to networking. :-))
> 

It's fine grained in ways I don't care about, and also doesn't control
things I do care about.

I don't care how much core memory is locked, that you have a valid home
directory, how big your stack size is - I'd like to say "you get 1
gigabyte, use it as you want." It makes perfect sense when you have 100
humans sharing a computer. It makes a lot less sense when you have 100
applications sharing a computer that only one human has a shell account
for.

> For anything beyond that I'm pretty certain that a full virtual machine
> is the real solution.  Personally I think full VMs are the K.I.S.S. of
> death for "containers" (especially once you see the nightmare of
> complexity that underlies them in the kernel, i.e. CGroups).  I would
> much rather have the clearly structured overhead of the VM instead of
> the hidden overhead and excessive complexity of "containers", or even
> just some of the bits of them like virtual network stacks.
> 

I see virtualization as kind of wasteful, as the whole point of a
kernel is "virtualize" the CPU resources so multiple processes can
share the same computer. Now we throw in a hypervisor so multiple
programs - which are made to allow multiple programs to share the same
computer - can share the same computer. Kernels, all the way down :)

Since CPU cycles are cheap, there's also this: the full VM route brings
quite a bit of human overhead as well. Now I (usually) have to maintain
a copy of the entire operating system for every application I wish to
run. I need to centralize authentication services and logging. Tools
like Ansible come into play to ensure VM's don't drift away from a
common specification. Different OS versions cause random
incompatibilites - once you're past a ten or twenty VM's, it's
very unlikely you can upgrade all of them at once.

With a container system, I can drop from 100+ operating system on five
hypervisors to five operating systems. (This also excludes the
hypervisor/Dom0, which is another OS.) Same amount of hardware, but an
entire layer has been ripped out of the stack. The managability actually
improves.

Anyway, none of computing on this scale is simple. The complexity will
exist somewhere, it's a question of where.

We're seeing different ways of how things should be, and that's okay.
I've spent a decade managing large networks of full VM's, and I've
found a different vision. My favorite OS lacks a major that many other
OS's have.

And so I use chroot. I get most of the benefits of a container, but
ultimately, it's just a toy.

> All this said though I would note that perhaps re-engineering the whole
> network stack in the netgraph way (perhaps directly using netgraph[1]),
> provides some form of "virtualization" for network things in a clean and
> structured way.
> 
> [1] https://en.wikipedia.org/wiki/Netgraph
> 

Haven't run into Netgraph before, I'd have to read up on it. Thanks for
the link.

-- 
Aaron B. <aaron%zadzmo.org@localhost>

Follow-Ups:
- Re: NetBSD Jails
  - From: Jeremy C. Reed
- Re: NetBSD Jails
  - From: Greg A. Woods

References:
- NetBSD Jails
  - From: Julien Savard
- Re: NetBSD Jails
  - From: Aaron B.
- Re: NetBSD Jails
  - From: Greg A. Woods
- Re: NetBSD Jails
  - From: Aaron B.
- Re: NetBSD Jails
  - From: Greg A. Woods

Prev by Date: Re: Python flask app behind bozohttpd vs separate webserver for app
Next by Date: Re: Python flask app behind bozohttpd vs separate webserver for app
Previous by Thread: Re: NetBSD Jails
Next by Thread: Re: NetBSD Jails
Indexes:

Home | Main Index | Thread Index | Old Index