tech-kern: Re: representation of persistent device status, was Re: devfs, was Re: ptyfs...

Subject: Re: representation of persistent device status, was Re: devfs, was Re: ptyfs...
To: Matthew Orgass <darkstar@city-net.com>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 11/30/2004 13:52:28
On Tue, Nov 30, 2004 at 10:13:12AM -0500, Matthew Orgass wrote:
> On 2004-11-30 tls@rek.tjls.com wrote:
> 
> > 2) Typically, my hardened systems run with all writable filesystems mounted
> >    nodev.  Let me just venture to guess that if I weren't pointing it out
> >    right now, nobody would bother to think that devfs must refuse to mount
> >    if its configuration file were on a nodev filesystem.
> >
> > 3) Enforcing the restriction necessary due to #2 means that the file
> >    parser *must* be in the kernel (think about it: you *cannot* allow
> >    a userland program to feed you a devfs config structure from RAM,
> >    or there is no point to ever trying to mount anything nodev; the
> >    kernel *must* read the config file itself so it can know where it
> >    is stored and check for nodev).  That means quite a bit of complicated
> >    code in the kernel (including a parser, and code to read files from
> >    the filesystem, which AFAIK only LFS does right now, and that only
> >    for the ifile)
> 
>   I don't understand point #3.  If the file that specifies that the file
> systems are supposed to be mounted nodev is not parsed by the kernel, why
> would devfs config be different?  Continued use of the file from RAM might
> be bad, but this could easily be avoided.

Nice try, but no prize.  Your analogy doesn't hold, which can easily be
demonstrated by working through a simple example.

Consider a system with the following filesystems mounted:

/dev/wd0a 	/		ffs rw
/dev/wd0b	/var		ffs rw,nodev,noexec
/dev/wd0e	/tmp		ffs rw,nodev,noexec

The system runs at security level 2.  There are no device nodes present
except for /dev/tty00, /dev/tty, /dev/null, /dev/zero, and /dev/wd0a,
which is mode 00500.  wd0 has only an 'a' partition of type FFS.  The
kernel does not have any other filesystems in it.

Since no writable filesystem is mounted such that new device nodes can
be created and used, I can know that I _cannot_ have a problem with,
for example, a new device node for /dev/mem or /dev/wd0d popping up;
thus I can safely exclude all such access to the kernel (and through it,
the machine's hardware) from consideration when thinking about potential
persistent compromise of the machine.

Unfortunately, I cannot be so sure that malicious code will not cons up
an appropriate mount structure and feed it to the kernel (which is why
it is not sufficient to simply remove the "mount" executable, or
replace it with one that will refuse to mount other partitions).  Even
if the writable filesystems are mounted 'noexec', an attacker could jump
into code in memory allocated with malloc(); and we cannot prevent this,
on some architectures.  So, the attacker could create a new devfs mount
structure in memory, and feed it to the mount system call; thereby
bypassing the "nodev" attribute on all writable filesystems and making
new device nodes available as avenues of attack.  Without devfs, I
simply do not have to worry about this.

Now, security level 2 forbids *all* new mounts; I did this long ago as
a very crude hack to allow me to not worry about new mounts of MFS
filesystems without nodev and noexec.  However, that _is not and should
not be_ necessary just to actually have nodev semantics enforced, and
in fact one project I worked on simply added a small number of lines to
the kernel to enforce the "writable means nodev" policy.

With devfs, with the nodes-and-permissions structure parsed by userland
and fed to the kernel in-memory so it cannot know its provenance, it is
essentially the case that nodev is meaningless against even a moderately
sophisticated attacker.  And that means that the kernel has to parse the
file, so that it can know that it did not come from a nodev filesystem;
or we have to just punt on nodev semantics entirely.

I'm well aware that some of the people advocating devfs-uber-alles could
not care a bit, in their particular applications, whether nodev loses its
utility in applications like mine.  If they, thus, insist in putting the
parser in userland (which means that nodev cannot work as intended, as
outlined above) all *I* ask is that they leave it possible to build
kernels with traditional old device nodes so I can be *sure* no new
devfs nodes will pop up and break the assumptions of my security model.

The other alternative is to put the configuration file parser in the
kernel; which has its own drawbacks, but which is at least less odious
than eliminating any chance whatsoever of nodev working as Jonathan and
I, and presumably others building hardened devices, need it to.  In
that case, particular care needs to be paid to the demonstrable
correctness of the code, and also to small size (our kernel is *already*
huge when coompared to its size just a few years ago, and clearly devfs
will be at least several tens of kilobytes larger than the old code for
static device nodes).  If we can manage that, fine.  But that does not
mean that it will be easy, and we should weigh the cost of doing devfs
right in this sense against the cost of leaving it an *option* with
static device nodes for those of us who don't want it.

Mandatory devfs with the configuration parser in userland, however, is
basically as bad as it gets.  It is a mandatory increase in complexity
and size and a *regression* from the existing situation with regard to
security against persistent system compromise.

Yuck.

Thor