Subject: Re: Jail For NetBSD
To: NetBSD Kernel <tech-kern@netbsd.org>
From: The Black Hacker <blackye@break.net>
List: tech-kern
Date: 12/06/2004 17:46:15
On Dec 6, 2004, at 3:57 PM, Gordon Waidhofer wrote:
>     1. Hardware emulation......
>     2. User Mode Linux..........
>     3. Xen..........
>     4. Separation..........

As a long-term fan of NetBSD and heavy FreeBSD user/sysadm I think I 
can give a contribution here, at least an "user point of view":

In '1)' there are a dozen of examples (i.e. VirtualPC): That's good for 
legacy/compatibility stuff or for development and testing, I would not 
use anythng like that in production
About '2)' there are also other solutions like running a mach kernel as 
a process on any mach-based OS, it's the perfect solution when you want 
to have "total" separation, I used it a lot when I was playing with 
custom kernels....
'3)' is simply a variant which runs the "inside kernels" into a 
dedicated system instead of into a normal OS, I can see the advantage 
of speed but... then you have to use both a "modified" os "in the child 
machine" and a special kernel as "parent machine"
'4)' is where FreeBSD jails fall. The fact that there is only one 
kernel running (the "container box" one) can be seen as an advantage or 
a disadvantage

Jails probably don't reach the same level of separation as running a 
separate kernel for each virtual machine (in example they must all be 
FreeBSD, all same kernel version, and with still some limits like SysV 
shared memory areas which are either forbidden or shared across the 
jails, and you won't use a jail to test/debug a kernel, as an example), 
but have a number of advantages for "daily production" use:

The outside box is there, is a full system, has access to all the 
filesystems and all the processes of the jails (again this can be a 
pro- or a con-), thus:
- The outside box root can recover the root password of a jail if they 
user forgets it
- The outside box root can start a process inside the jail or kill it
- It is possible to mount portions of the directory space of one 
virtual machine in another, eventually as read only (in example it's 
nice to have only one copy of /usr/ports accessible in read-only by all 
the jails)
- It is possible to decide which devices to show into a jail's devfs, 
and they ARE the raw devices
- All is done at zero-overhead, a process inside the jail is simply a 
process of the outside machine, a device accessed by the jail is simply 
a device accessed through *one* kernel, a file is... usually visible as 
such in the parent machine
- The implementation is very simple, I remind a paper describing the 
first jail implementation of FreeBSD and it all was less than 400 lines 
of code changed (exact wording: "The change of the suser(9) API 
modified approx 350 source lines distributed over approx. 100 source 
files. The vast majority of these changes were generated automatically 
with a script. The implementation of the jail facility added approx 200 
lines of code in total, distributed over approx. 50 files. and about 
200 lines in two new kernel files. "): is not much different from a 
chroot environment, plus devfs policies, plus ip access policies and a 
few other "barreers" in what is normally made available to mister root 
as a "special priviledge". Basically: Check all the system calls and 
see if they do somewhere a "if (uid==0)", then if appropriate change it 
to "if (!jailed && (uid==0)).....
- All is done with an "out of the box" kernel, no pacthes, no changes, 
no-nothing. You can have your production FreeBSD box running and then 
start up a jail into it without even rebooting it.

Currently where I work we have a couple of FBSD systems running the 
"core" network services in what I call "nanojails" (a directory in 
which there is the minimum needed by that daemon to run, bound to it's 
IP) and another couple on which we run several "full jails" (complete 
FreeBSD installations accessible as independent machines).  Nanojails 
are often implemented by changing a single system call in the daemon's 
source (jail() instead of chroot()) so that the daemon starts, load the 
libraries, read the conf and... then self-jails into a restricted space 
with the minimum needed to work (not only in terms of filespace, but 
also in terms of IP connectivity).

In the first case there is an added security respect to the traditional 
chroot solution (the IP traffic is limited from/to the IP of the jailed 
daemon) and in the second the users have "their machine" (in which they 
have the root password, they can install/remove whatever they want etc) 
but we bought the hardware only once (plus they cannot do funky network 
scans/spoofs/sniffing, change the IP address, change the ipfw policies, 
... and when they forget the root password or mess the system badly 
someone "from the outside" can fix things quite easily and without even 
rebooting the child machine). As a matter of fact an "average user" can 
hardly tell if his "machine" is physical or virtual....

So yes, definitively, Jails would be a Good Thing (tm) for NetBSD too.

Of course yes: it all can be done also with chroot + systrace or well 
done mandatory access control policies but.... a jail is a nice setup 
that an average sysadm can have in production in half an hour...

Ciao,

A.