current-users: Re: easy ways to crash your NetBSD system

Subject: Re: easy ways to crash your NetBSD system
To: Greg A. Woods <woods@planix.com>
From: Erik E. Fair <fair@clock.org>
List: current-users
Date: 04/09/1996 13:49:45
At 13:06 4/8/96, Greg A. Woods wrote:

>BTW, the original "bug" was described using a shell example in a paper
>I've since lost track of.  If I'm not mistaken this same paper also
>included the following example which crashed V7 as well:
>
>	while : ; do
>		mkdir a
>		cd a
>	done
>
>This latter problem does seem to have been fixed....

The paper is a two or three pager (undated) entitled "On the Security of
UNIX" by Dennis M. Ritchie, and the first copy I read came with the 7th
Edition UNIX distribution. I read it on the manual rack in Cory Hall at
U.C. Berkeley in late 1980. There was an nroff source around for a long
time in /usr/doc in the distributions, and it is still found in the 4.3 BSD
manual set that USENIX published: SMM:17

That script was known to kill a V6 UNIX from inode table exhaustion; V7 had
this bug fixed.


There is an overall point here: real operating systems do not crash, unless,

1. there is a bug in the OS. (Such bugs are not to be tolerated, and must
be fixed)

2. the OS catches the hardware doing something it ought not do, and the OS
cannot recover.

There's a secondary statement to be made about level of heroics an OS ought
to try in order to recover from treacherous hardware, versus the cost in
time to write and space to hold code that is likely never to be executed on
the typical system, but I'll not make it right now.

The point is that it should be *impossible* for a normal user to crash the
system due to any resource exhaustion that s/he might cause, deliberately
or unintentionally. The rabbit-forker is an example of a thing that happens
in university environments all the time ("Gee, I wonder what would happen
if I tried this..."), and the system should sensibly deal with it (i.e.
when resources are exhausted, UNIX should refuse the requested operation
and return an error). If you want another roaring piece of nonsense, try
this sometime:

sh
$ << `ls`

your standard Bourne shell will go into infinite stack growth, and some
UNIX systems will deal with this gracefully, and others will crash (I
learned this one from Jim McKie at the USENIX in SLC in 1984 - someone at
CWI in Amsterdam had been trying to write a grammar for sh, and discovered
all kinds of strange things about its parser in the process. Perforce, most
of us who had lunch with Jim that day went into the vendor show hall and
tried this on as many of the systems there as we could - the one that
impressed me most was the Pyramid 90x, which recognized the infinite stack
growth in sub-second time and killed the process with an error code that
identified exactly what had happened).

No one anywhere has a computer with infinite RAM or disk. Dynamic
allocation of kernel data structures does offer the potential to use the
available resources more efficiently (no more big fixed-length arrays for
structures that are little-used), but we should never fool ourselves into
thinking that means that kernel malloc will never refuse a request for
memory. It also opens up more interesting deadly-embrace cases.

The superuser is a special case, because UNIX allows that user vastly more
latitude to do things. Clearly:

% cp /dev/null /dev/mem

will eventually crash the system. It's a waste of time to enumerate all the
cases where the superuser can do something stupid like this, and protect
against the consequences, especially since such protections will get in the
way of doing some useful things. However, doing something reasonable, like
large reads from a device (one of the examples that started this
discussion), should *never* panic the system - that's a kernel bug, in
category #1 above.

This is not just a question of quality or bugs - it's also a security
issue; please note the name of the paper in which Ritchie discussed kernel
resource exhaustion handling, cited above. To the extent that we'd like to
add new and sexy security technology to the system (e.g. IPsec), we gotta
make sure the mundane stuff is done too, or the sexy stuff won't really
improve security. Think about the fingerd bug that the 1988 Morris worm
exploited - there's an example of "software engineering as a security
issue" if I ever saw one.

Don't ignore the list of panic calls that was grep'd from the sources
earlier in this discussion - the question is, given that list, are all of
them reasonable responses to the condition that the code discovered at that
point? We should go through each one and re-evaluate from time to time, to
make NetBSD more robust and reliable.

Just because the Mac & PC weenies get away with crashing their systems on
the first wild pointer doesn't mean NetBSD should stoop to their level.
Personally, I bet we see a big backlash against that kind of rapant
instability in those systems in the next year or two.

just call me "Cassandra",

Erik Fair